We have been working on ros2_medkit, an Apache 2.0 fault aggregation gateway for ROS 2 that follows the SOVD model (ISO 17978-3). All of it lives in GitHub - selfpatch/ros2_medkit: ros2_medkit - diagnostics gateway for ROS 2 robots. Faults, live data, operations, scripts, locking, triggers, and OTA updates via REST API. No SSH, no custom tooling. · GitHub
Two integration paths today:
-
/diagnosticstopic - drop-in, no code changes on the publisher side. Works for any package already usingdiagnostic_updater. -
Native
FaultReporterinstrumentation - each failure surface emits a structured fault code directly. We tried this on a manymove fork to see how invasive it is to add per-action-node fault reporting. PR is here for reference: Feat/medkit integration by mfaferek93 · Pull Request #1 · selfpatch/manymove · GitHub - the integration itself was small (one mixin + a fault-codes header), but that fork is a fairly clean codebase. Most production stacks have a much messierRCLCPP_ERROR/RCLCPP_WARNhistory that nobody is going to retroactively convert.
Native FaultReporter is the right answer when you control the codebase end-to-end - structured codes from day zero, lowest friction long-term. The painful case is the long tail of existing ROS 2 packages that already work fine and never emitted /diagnostics. For those, the drop-in bridge has nothing to subscribe to, and asking maintainers to instrument every node won’t happen. If the goal is to make structured diagnostics adoptable across the ecosystem, plug-and-play needs to mean more than “use /diagnostics”.
That gap is what we want to validate with you.
We are considering a third path: a logs-to-faults bridge that watches /rosout (or arbitrary log streams) and promotes selected patterns to structured fault events, with configurable rules (severity mapping, dedup, rate limiting). Goal: a team can adopt structured diagnostics without touching their existing code. If it works out, it ships in the same open repo as the rest of medkit.
Three questions where your experience would help more than ours:
-
What log patterns in your stack would you actually want auto-promoted to structured faults? (specific examples > taxonomies)
-
What blocks your team from using
/diagnosticsmore widely today? -
For a logs-to-faults bridge to be useful and not noisy, what would have to be true? (rules engine, allowlist-only, ML, something else?)
Curious what others have tried, especially on the failure-modes side.
- /diagnostics (DiagnosticArray)
- Custom error/event topics
- Logs (RCLCPP_ERROR / RCLCPP_WARN)
- Action results / service error codes
- Behavior tree / lifecycle state changes
- Tracing / OpenTelemetry
- No consistent pattern yet
