[Discussion] Why Vision-Guided Robots Still Fail in Production Even When Detection Works

Hi, guys
I developing diagnostic programs around whether the command stream, feedback stream, timing window, and physical responses in ROS remain consistent, with a lightweight experimental software package named ros2_kinematic_guard . I have identified recurring issues in the vision‑guided assembly system:

The robot “sees correctly” — but the executed grasp/pose slowly diverges from the expected state over time.

This seems especially common in setups involving:

  • RealSense D435i
  • TF-based grasp pipelines
  • MoveIt servoing
  • RGB-D pose estimation
  • asynchronous ROS2 nodes

Typical symptoms observed:

  • hand-eye calibration gradually becoming inconsistent after thermal drift
  • grasp points oscillating despite stable detections
  • TF trees remaining valid while pose execution becomes unstable
  • frame timestamp mismatch causing “see correctly, grasp incorrectly”
  • retry/relocalization logic amplifying small pose residuals

Interestingly, most systems still “look healthy” from standard monitoring:

  • bbox/confidence remain high
  • TF graph exists
  • topics publish normally
  • planners succeed

…but the physical execution path drifts.

Therefore, I intend to adopt a lightweight residual monitoring method from ros2_kinematic_guard , which focuses on three indicators:

1. Pose Residual Drift

Monitoring divergence between:

expected_pose(t)
vs
executed_pose(t)

over time.

Especially useful for detecting thermal/mechanical calibration drift.

2. Temporal Coherence Residual

Tracking timestamp alignment between:

  • image frame
  • TF transform
  • depth frame
  • grasp pose generation

to detect async ordering issues.

3. Action Stability Residual

Detecting oscillation/jitter in generated grasp points or servo actions across adjacent frames.

This catches cases where the vision system is technically “working” but unstable under lighting/reflection disturbances.

The key idea:

Instead of asking:

“Did perception succeed?”

we ask:

“Did the system state remain converged throughout the perception → planning → action path?”

Curious if others in production robotics are already monitoring these kinds of residuals.

Especially interested in:

  • vision-guided assembly
  • dynamic calibration compensation
  • ROS2 observability
  • VLA/VLM action stability
  • production deployment diagnostics