What patterns of logs or warnings should an automated bridge promote to structured faults in ROS 2?

Michal_Faferek · May 15, 2026, 10:04am

We have been working on ros2_medkit, an Apache 2.0 fault aggregation gateway for ROS 2 that follows the SOVD model (ISO 17978-3). All of it lives in GitHub - selfpatch/ros2_medkit: ros2_medkit - diagnostics gateway for ROS 2 robots. Faults, live data, operations, scripts, locking, triggers, and OTA updates via REST API. No SSH, no custom tooling. · GitHub

Two integration paths today:

/diagnostics topic - drop-in, no code changes on the publisher side. Works for any package already using diagnostic_updater.
Native FaultReporter instrumentation - each failure surface emits a structured fault code directly. We tried this on a manymove fork to see how invasive it is to add per-action-node fault reporting. PR is here for reference: Feat/medkit integration by mfaferek93 · Pull Request #1 · selfpatch/manymove · GitHub - the integration itself was small (one mixin + a fault-codes header), but that fork is a fairly clean codebase. Most production stacks have a much messier RCLCPP_ERROR / RCLCPP_WARN history that nobody is going to retroactively convert.

Native FaultReporter is the right answer when you control the codebase end-to-end - structured codes from day zero, lowest friction long-term. The painful case is the long tail of existing ROS 2 packages that already work fine and never emitted /diagnostics. For those, the drop-in bridge has nothing to subscribe to, and asking maintainers to instrument every node won’t happen. If the goal is to make structured diagnostics adoptable across the ecosystem, plug-and-play needs to mean more than “use /diagnostics”.

That gap is what we want to validate with you.

We are considering a third path: a logs-to-faults bridge that watches /rosout (or arbitrary log streams) and promotes selected patterns to structured fault events, with configurable rules (severity mapping, dedup, rate limiting). Goal: a team can adopt structured diagnostics without touching their existing code. If it works out, it ships in the same open repo as the rest of medkit.

Three questions where your experience would help more than ours:

What log patterns in your stack would you actually want auto-promoted to structured faults? (specific examples > taxonomies)
What blocks your team from using /diagnostics more widely today?
For a logs-to-faults bridge to be useful and not noisy, what would have to be true? (rules engine, allowlist-only, ML, something else?)

Curious what others have tried, especially on the failure-modes side.

How do you currently surface fault/error state from your ROS 2 nodes?

/diagnostics (DiagnosticArray)
Custom error/event topics
Logs (RCLCPP_ERROR / RCLCPP_WARN)
Action results / service error codes
Behavior tree / lifecycle state changes
Tracing / OpenTelemetry
No consistent pattern yet

0 voters

Timple · May 15, 2026, 1:20pm

I like how you try to cope with the current state of nodes instead of enforcing code changes on nodes. Such a ‘translation’ approach certainly has it’s place.

We have a year old PR on the diagnostic aggregator open:

It publishes the reason for a degradation in diagnostics.

Such a mechanism is very convenient for an immediate report instead of an analysis in hindsight. Think of displaying it on a remote control for the operator.

It immediately points someone into the right direction, so something like this would be valuable as well.

bburda · May 15, 2026, 2:00pm

Oh wow, did not know about PR 506 - and a year open is rough. Glad we are not the only ones bumping into this . The input side of medkit is basically a matcher: log lines and warning patterns go in, structured events come out (category, severity, source, cause). The SOVD-specific stuff lives downstream of that, so the matcher itself is generic. Feels like something that could just as well live in ros/diagnostics if there is interest - we are going to build it either way because we need it, but happy to design it with reuse in mind from day one.

Michal_Faferek · May 15, 2026, 2:01pm

happy to review the PR from a consumer angle if it helps move it along

gbiggs · May 17, 2026, 11:11pm

Please do. Anyone can review any PR, and having community members review each other’s PRs reduces the load on the maintainers, meaning that all PRs will get merged faster.

Michal_Faferek · May 19, 2026, 4:47pm

@Timple done, comments left on Feature/report stale if one stale by Timple · Pull Request #506 · ros/diagnostics · GitHub . The “why is this stale” bit is exactly the kind of signal that makes downstream consumption easier; if /diagnostics carried degradation reasons as first-class fields, the matcher @bburda mentioned would have less guesswork to do.

Still keen to hear what log patterns from other people’s stacks would be worth auto-promoting

Topic		Replies	Views
What's missing from ROS 2 diagnostics (and what we built) Projects ros2 , jazzy , diagnostics	2	649	February 13, 2026
Ros2_medkit: API-first diagnostics for ROS 2 Projects ros2	1	731	December 22, 2025
3 Lines of C++ That Give Your Robot a Fault Memory Projects ros2 , jazzy , diagnostics	0	330	February 16, 2026
Diagnostic-aggregator and diagnostic-updater porting to ROS2 ROS General ros2	15	5409	January 24, 2019
Diagnostic Remote Logging Announcement ROS General	0	426	May 12, 2025

What patterns of logs or warnings should an automated bridge promote to structured faults in ROS 2?

Related topics