I’m working on implementing end-to-end tracing for robotic behaviors using OpenTelemetry (OTel) in ROS 2. My goal is to trace:
-
High-level requests (e.g., “move to location”) across components to analyze latency
-
Control commands (e.g., teleop) through the entire pipeline to motors
Current Progress:
-
Successfully wrapped ROS 2 Service and Action servers to generate OTel traces
-
Basic request/response flows are visible in tracing systems
Challenges with Nav2:
-
Nav2 heavily uses pub/sub patterns where traditional instrumentation falls short
-
Difficult to maintain context propagation across:
-
Multiple subscribers processing the same message
-
Chained topic processing (output of one node becomes input to another)
-
Asynchronous publisher/subscriber relationships
-
Questions:
-
Are there established patterns for OTel context propagation in ROS 2 pub/sub systems?
-
How should we handle fan-out scenarios (1 publisher → N subscribers)?
-
Any Nav2-specific considerations for tracing (e.g., lifecycle nodes, behavior trees)?
-
Alternative approaches besides OTel that maintain compatibility with observability tools?