ROS 2 Diagnostics Are Stuck in 2010
Part 1 of “Beyond ros2 topic echo /diagnostics” series on production diagnostics for ROS 2.
We’ve all been there. Robot stops moving in the field. You SSH in, run ros2 topic echo /diagnostics, and get a wall of text scrolling faster than you can read…
Something flashes ERROR for half a second. Then it’s OK again. Then ERROR.
You’re not sure if you’re looking at a real problem or sensor noise. There’s no history, no context, nothing to go back to. The fault is gone before you can even copy-paste it.

diagnostic_updater is honest about what it is
REP 107 doesn’t pretend to be something it’s not:
“The diagnostics system is designed for collecting, publishing, and reporting hardware diagnostics data from a robot.”
and:
“The intended consumer of diagnostics data is a person.”
That’s it. Hardware data, for a human looking at a screen. No API, no persistence, no fault lifecycle. diagnostic_updater is a port from ROS 1, designed for a time when one person operated one robot in a lab. And honestly, for that use case it’s fine.
But we’re not in 2010 anymore. Robots ship to customers. They run in warehouses, on farms, on construction sites. A typical robot in 2010 had maybe a laser scanner and a few joint encoders - a handful of topics at low frequency. Now you’ve got 3D LiDAR, stereo cameras, IMUs, GPS, force/torque sensors, all publishing at hundreds of Hz. Good luck scrolling through that in a terminal. When something breaks at 3 AM, nobody’s there to stare at it anyway.
What’s actually missing
| What you need in production | What you get today |
|---|---|
| ✅ Structured fault codes with severity | ❌ 4 levels + a string |
| ✅ Fault history | ❌ Pub/sub. Blink and it's gone. |
| ✅ Fault lifecycle (report → confirm → heal → clear) | ❌ Stateless. Every glitch is an event, no debounce, no filtering. |
| ✅ REST API for dashboards, fleet tools, alerting | ❌ rosbridge WebSocket or a full ROS 2 client |
| ✅ Root cause analysis | ❌ Nothing |
| ✅ Automatic data capture on fault | ❌ rosbag record and pray |
This isn’t some wish list for the future. Every serious production system has these things. Your car has had them since 1996 (OBD-II).
Cars solved this. Why haven’t we?
Take your car to the mechanic. They plug in a reader and get:
-
a fault code (P0301 - cylinder 1 misfire)
-
a freeze frame of what the engine was doing when it happened
-
a history of when it first occurred
-
a way to clear it after repair
That’s not rocket science. It’s just standardized diagnostics.
The automotive world has been on this path for decades. Their latest standard (SOVD from ASAM) drops the old binary CAN protocols and uses plain HTTP/REST. A JSON API for vehicle diagnostics. Sounds familiar? It should, because that’s what every web developer already knows how to use.
Robots today remind me of cars in the early 2000s. Complex enough that things break in weird ways, but still diagnosing problems with the equivalent of sticking your head under the hood and listening.
We’re working on this
We spent last few months building ros2_medkit - it’s our attempt at filling these gaps for ROS 2:
-
Fault lifecycle with debounce and filtering
-
Automatic rosbag capture on fault
-
REST/SSE gateway - no ROS 2 client needed
-
Root cause correlation
Open source, Apache 2.0, runs on Jazzy.
This is the first post from a series where we’ll walk through all of it. Each post will have a Docker demo you can spin up and try. No GPU needed, no complicated setup.
Navigation got Nav2. Manipulation got MoveIt2. We think diagnostics deserves its own stack too.
Next episode: Part 2 - Your First Fault Manager in 5 Minutes

