A question I keep running into and don’t have a clean answer for: when a
ROS2-based autonomous system makes a consequential decision — a mobile
robot reroutes around a person, an arm stops mid-motion, a drone aborts —
we can answer what it did. ros2 bag captures the topics. But why it
did that, in a form a safety officer, an insurance adjuster, or a regulator
can read, is almost always reconstructed after the fact, by hand.
Four specific observations, curious where I’m wrong:
1. Rule provenance is invisible. When a BehaviorTree node fires, we
log the node, not the human-authored policy that made the node legal. No
first-class link from “robot stopped” to “rule §3.2 of safety policy v4
triggered.”
2. Guardrails are one-way safety, not auditable downgrades. Most ROS2
safety layers I’ve seen are kill switches or velocity caps. They prevent
harm but produce no signed record of “planner wanted X, guardrail
downgraded to Y, here’s the chain.”
3. LLM-in-the-loop adds a new failure mode. With VLA stacks plugging
into task planning, the “why” gets harder. Did the model suggest the
action? Was it followed, overridden, sanitized? I don’t see standard hooks
for any of this in the stack.
4. EU AI Act Article 12 and 14 are now in force for high-risk autonomous
systems. Most teams I talk to plan to handle “logging” and “human
oversight” with ros2 bag plus a spreadsheet. That will not survive a
regulator audit, and CE marking deadlines for some categories hit in 2027.
Three questions for people deeper in this than me:
- Is there an active REP or working group on decision provenance that I
missed? I found scattered threads, no spec. - For Nav2 + BehaviorTree.CPP teams: how do you currently answer “why
did the robot decide that?” for non-engineer stakeholders? - Has anyone added cryptographic signing to the rosbag pipeline, or is
everyone trusting the filesystem and timestamps?
I’ve been building an opinionated implementation of some of this — rule
provenance, signed audit chain, guardrail-downgrade-only pattern, LLM
sanitization — outside of ROS2, and I’m trying to figure out if the pieces
that generalize are worth porting and open-sourcing.
If this resonates, drop a reply or DM. Looking for both “you’re missing
existing work X” and “yes this is broken in our deployment, here’s how.”