We’ve been building robots with ROS2 for years, and we hit the same wall every time a robot fails in production:
The debugging process:
-
SSH into the machine
-
Grep through logs
-
Check ROS2 topics (which ones stopped publishing?)
-
Replay bag files
-
Cross-reference with deployment changes
-
Try to correlate infrastructure issues with ROS state
This takes 3-4 hours. Every time.
The problem: ROS gives you raw telemetry, but zero intelligence connecting infrastructure metrics + ROS topology + deployment history. You’re manually stitching pieces together.
So we built Ferronyx to be that intelligence layer.
What we did:
-
Real-time monitoring of ROS2 topics, nodes, actions + infrastructure (CPU, GPU, memory, network)
-
When something breaks, AI analyzes the incident chain and suggests probable root causes
-
Deployment markers show exactly which release caused the failure
-
Track sensor health degradation before failures happen
Real results from our beta customers:
-
MTTR: 3-4 hours → 12-15 minutes
-
One customer caught sensor drift they couldn’t see manually
-
Another correlated a specific firmware version with navigation failures
We’re looking for 8-12 more teams to beta test and help us refine this. We want teams that:
-
Run ROS2 in production (warehouses, humanoids, autonomous vehicles)
-
Actually deal with downtime/reliability issues
-
Will give honest feedback
Free beta access. You help shape the product, we learn what breaks.
If you’re dealing with robot reliability headaches, reply here or send a DM. Would genuinely love to hear your toughest debugging stories.
Links:
https://ferronyx.com/