Hi ROS Community,
Have you ever tuned DDS QoS for hours, only to see a mobile robot execute a “ghost command” after a bad Wi-Fi / 5G spike?
I’ve been working on a small ROS 2 project called ros2_kinematic_guard to address what I call Wireless Command Collapse.
The issue is not just packet loss. Sometimes the message arrives, but it is no longer safe or meaningful to execute by the time it reaches the robot.
The Problem: when timing betrays motion
A heartbeat or timeout can tell you whether messages are still arriving. It usually cannot tell you whether a /cmd_vel command still makes sense relative to the robot’s current odometry.
Common failure modes:
- Ghost commands: an old command arrives late and gets executed after the robot state has already changed.
- Burst / jitter windows: delayed commands are released together, causing abnormal acceleration or jerk demand.
- QoS traps: certain reliability / history settings can turn network lag into a burst of outdated motion commands.
The Solution: ros2_kinematic_guard
ros2_kinematic_guard does not try to fix DDS, QoS, or the network.
It sits between the incoming command stream and the robot controller:
/cmd_vel_in
↓
NARH Guard
↓
/cmd_vel_out
The guard evaluates a short local window:
previous command
current command
previous odometry
current odometry
It computes a lightweight NARH-lite residual, R_NAR, based on timing consistency, stale-command risk, acceleration / jerk limits, and command-vs-odom consistency.
When R_NAR crosses a critical threshold, the guard enters a deterministic state machine:
RED_BRAKE → BRAKE_AND_RESYNC → RESYNCING
That means it cuts motion, flushes poisoned command windows, waits for a fresh command/odom window, and only then releases control.
Why not just use heartbeat / timeout?
| Failure Mode | Traditional Heartbeat / Timeout | NARH Kinematic Guard |
|---|---|---|
| Packet Loss | Can detect silence | Can detect silence and brake |
| Stale Command | Often missed | Detected via kinematic inconsistency |
| Burst / Jitter | Often missed | Detected via residual spike / dt collapse |
| Stale Replay | Often missed | Detected via timing + odom conflict |
| Recovery | Time-based | Resync gate with fresh command/odom window |
Try it in 30 seconds, no robot required
The repo includes a complete Bad-Wi-Fi pressure test loop:
jitter_injector_node.py: creates delayed / duplicated / bursty / replayed commandskinematic_guard_node.py: computesR_NARand outputs protectedsafe_cmd_velsynthetic_odom_provider.py: acts as a virtual robot body and publishes/odom
Run:
source /opt/ros/humble/setup.bash
colcon build --symlink-install
source install/setup.bash
ros2 launch ros2_kinematic_guard start_pressure_test.launch.py profile:=wifi_collapse
Then watch:
ros2 topic echo /kinematic_guard/status
ros2 topic echo /kinematic_guard/residual
Example guard response:
{
"status": "RESYNCING",
"action": "BRAKE_AND_RESYNC",
"r_nar": 5.749,
"safe_cmd": {
"vx": 0.0,
"wz": 0.0
}
}
At the default 20 Hz guard loop, intervention happens on the next guard tick, around 50 ms.
Repository:
Background
This project is an engineering projection of NARH — the Non-Associative Residual Hypothesis. In this ROS 2 version, NARH is used as a lightweight command-flow consistency metric, not as a full dynamics solver.
I’d love feedback from anyone running mobile robots over unreliable Wi-Fi / 5G links.
Especially interested in real /cmd_vel + /odom bag / MCAP logs showing strange jitter, stale command behavior, burst delivery, or command-flow anomalies.