What patterns of logs or warnings should an automated bridge promote to structured faults in ROS 2?

We have been working on ros2_medkit, an Apache 2.0 fault aggregation gateway for ROS 2 that follows the SOVD model (ISO 17978-3). All of it lives in GitHub - selfpatch/ros2_medkit: ros2_medkit - diagnostics gateway for ROS 2 robots. Faults, live data, operations, scripts, locking, triggers, and OTA updates via REST API. No SSH, no custom tooling. · GitHub

Two integration paths today:

  1. /diagnostics topic - drop-in, no code changes on the publisher side. Works for any package already using diagnostic_updater.

  2. Native FaultReporter instrumentation - each failure surface emits a structured fault code directly. We tried this on a manymove fork to see how invasive it is to add per-action-node fault reporting. PR is here for reference: Feat/medkit integration by mfaferek93 · Pull Request #1 · selfpatch/manymove · GitHub - the integration itself was small (one mixin + a fault-codes header), but that fork is a fairly clean codebase. Most production stacks have a much messier RCLCPP_ERROR / RCLCPP_WARN history that nobody is going to retroactively convert.

Native FaultReporter is the right answer when you control the codebase end-to-end - structured codes from day zero, lowest friction long-term. The painful case is the long tail of existing ROS 2 packages that already work fine and never emitted /diagnostics. For those, the drop-in bridge has nothing to subscribe to, and asking maintainers to instrument every node won’t happen. If the goal is to make structured diagnostics adoptable across the ecosystem, plug-and-play needs to mean more than “use /diagnostics”.

That gap is what we want to validate with you.

We are considering a third path: a logs-to-faults bridge that watches /rosout (or arbitrary log streams) and promotes selected patterns to structured fault events, with configurable rules (severity mapping, dedup, rate limiting). Goal: a team can adopt structured diagnostics without touching their existing code. If it works out, it ships in the same open repo as the rest of medkit.

Three questions where your experience would help more than ours:

  • What log patterns in your stack would you actually want auto-promoted to structured faults? (specific examples > taxonomies)

  • What blocks your team from using /diagnostics more widely today?

  • For a logs-to-faults bridge to be useful and not noisy, what would have to be true? (rules engine, allowlist-only, ML, something else?)

Curious what others have tried, especially on the failure-modes side.

How do you currently surface fault/error state from your ROS 2 nodes?
  • /diagnostics (DiagnosticArray)
  • Custom error/event topics
  • Logs (RCLCPP_ERROR / RCLCPP_WARN)
  • Action results / service error codes
  • Behavior tree / lifecycle state changes
  • Tracing / OpenTelemetry
  • No consistent pattern yet
0 voters
1 Like