How to Implement End-to-End Tracing in ROS 2 (Nav2) with OpenTelemetry for Pub/Sub Workflows?

I’ve worked on something very similar using the built-in ROS 2 LTTng tracing instrumentation (on Linux). In case you haven’t seen it yet, take a look at [preprint] Message Flow Analysis with Complex Causal Links for Distributed ROS 2 Systems

That paper doesn’t include any concept of high-level requests like your Nav2/control examples, but it could be added on top, with extra processing, assuming that the user defines their “request.” For instance, the internal tool I mentioned in my ROSCon 2023 talk (slides, video) has a feature that lets users define the start and end of their processing pipeline (e.g., specific publisher to specific subscription), creates graphs for that processing pipeline, and then extracts & plots end-to-end latencies over time.

Note that services aren’t supported by the work presented in the paper above, but it’s more or less an extension of the current pub->sub logic, i.e., client->server->client.

That depends on your use case. What would this mean for a “move to location” request? For example, maybe you need to build the whole pub->sub or request<->reply graph and consider the request to be complete only when the last subscription receives the message, or when the last relevant service reply is received.