Predictability of zero-copy message transport

Hey! I’m looking to improve my ROS 2 code performance with the usage of zero-copy transfer for large messages.

I’ve been under the impression that simply composing any composable node into a container, and setting “use_intra_process_comms” to True would lead into zero-copy transfer. But after experimenting and going through multiple tutorials, design docs, and discussions, that doesn’t seem to be the case.

I wanted to create this thread to write down some of my questions, in the hopes of them being helpful for improving the documentation, and to get a better understanding of the zero-copy edge cases. I’m also curious to hear if there are already ways to easily verify that the zero-copy transfer is happening.


To my understanding, it looks like there are a bunch of different things that can have an influence if zero-copy happens or not:

  • Pointer type: The choice of SharedPtr, UniquePtr, etc. seems to affect on if zero-copy really happens or not [1], [2]
  • Number of subscribers: If we have many subscriptions to the same topic, some of the subscriptions might be actually creating a copy of the message [1]
  • QoS: Some of the quality of service types have been at least in the past unsupported [3]
  • RMW implementation: At least in the past, the middleware choice has played a role. How it is nowadays? How about with Zenoh? [3]
  • ROS Distribution version: Are there differences between existing distros (Humble to Rolling?)
  • Component container type: Based on my past experimentation, there seems to be a difference between the container type: component_container vs. _mt vs. _isolated.
  • A new inter-process subscriber outside of the composable container: What happens if we have a new inter-process subscription, outside of the composable container?
  • Publisher outside of the composable container: How zero-copy behaves in situations when we for example have the publisher node outside of composable container? Can multiple subscribers still benefit from zero-copy? From my past experimenting, it seems that they can.
  • Is there something else that can have an influence?

I’m looking to understand what are the cases when the zero-copy transfer really happens, and in which cases ROS just quietly falls back to copying the messages.

Many of these questions also boil down a bigger question: How can I verify if zero-copy happens, and what kind of performance benefits I’m getting from using it? All the demos I’ve seen until now simply print the memory address of the message to confirm that the zero-copy happens. I think it would be highly beneficial to have a better way directly in ROS 2 to see if zero-copy pub-sub is actually happening. Is there already a way to do that, or do you see how this could be implemented? Maybe through ros2 topic CLI?

In addition to the above questions, the tutorials and other resources still left me wondering about these ones:

  • What are all the different ways of achieving zero-copy transfer? Via Loaned messages? What are the benefits of it compared to intra-process communication (IPC)? In Jazzy, loaned messages tutorial has a mention “Currently using Loaned Messages is not safe on subscription” [4]
  • What are the performance gains of zero-copy? In which situations the serialization is completely avoided, and in which situations the middleware layer is skipped completely?
  • What is the role of “use_intra_process_comms” parameter? I’ve sometimes observed zero-copy happening even when this parameter is set to false. What are the benefits of having it as “false” (which it is by default when nodes are composed in launch file)?

[1] ROS Jazzy Tutorial - Intra-Process-Communication
[2] Discourse Thread - Performance Characteristics: subscription callback signatures, RMW implementation, Intra-process communication (IPC)
[3] ROS 2 Design Article - Intraprocess communications
[4] ROS Jazzy Tutorial - Configure Zero Copy Loaned Messages

3 Likes

Disclaimer: I’m currently putting my hands inside the IntraProcessManager code for a (hopefully) upcoming PR. This doesn’t mean that what I write below is the absolute truth: in doubt, pinpoint any potential error and I may to quickly double check the sources and understand the inner workings.

Nevertheless, I’m a frequent user of such feature - I rely on it a lot. So I hope to help a bit.

Pointer type: The choice of SharedPtr, UniquePtr, etc. seems to affect on if zero-copy really happens or not

Yes, IIRC better to pass ownership of an UniquePtr (via std::move) to the publish() call.

If you have all intra-process subscriptions, all with callbacks using `ConstSharedPtr` as parameter, you get the best of intra-process. If you have even one external process subscriber, or one subscriber with intra-process disabled, then you will start again with additional copies, rmw API layers, potentially down to CDR serdes and wire transfer. Some RMW configs supporting shared memory may “bypass/optimize“ the canonical way of sending messages from the publisher to the subscriber.

Nominally, no. IntraProcessManager doesn’t care. As stated above, for stuff that isn’t handled by IntraProcessManager there may be other ways to enhance data transfer (some RMWs implement shared memory “tricks“ - IIRC not with the basic config).

You start having additional copies in most circumstances, if not all.

When the publishing happens outside, same. The “hidden “IntraProcessPublisher“ does not get created (at least in the same process space of the Subscribers), thus, standard RMW handing of the messages published by such external Publisher happens.

Things start getting very interesting when you factor in “Type Adaptation“. You don’t even get to fill a message with your data (eg. content of an OpenCV::Mat or a pcl::PointCloud), just pass ownership of the real object. If you do not involve external pubs/subs, you may get very interesting results and at the same, more idiomatic code.

Strange to me - can you share an example?