Hi @JM_ROS @wjwwood, thanks for your reply. It’s been a pretty busy week for me, but I finally had some time to think through / test the suggestions you mentioned.
Sure, that make sense. I am trying to update the post to replace “IPC” with “intra-process comms”, but I am unable to find an edit button. If editing is still possible, please let me know and I’ll be happy to update it
Inspired by the demo in ros2_documentation, I am printing the address in each subscription callback.
Take const std_msgs::msg::Int32 & msg as an example. It enters the condition here: rclcpp/rclcpp/include/rclcpp/any_subscription_callback.hpp at c85ff926d2ea6c3fb8b8075e28b93632a791fd5c · ros2/rclcpp · GitHub
Adding
std::holds_alternative<ConstRefROSMessageCallback>(callback_variant_) ||
std::holds_alternative<ConstRefWithInfoROSMessageCallback>(callback_variant_)
to use_take_shared_method() seems to match the behavior of shared_ptr<const T>.
To my knowledge, the largest messages in the Nav2 stack are the global and local costmaps. With the default settings, the global costmap is published at 1 Hz and the local costmap at 2 Hz, and their sizes are on the order of a few hundred kilobytes. We are also subscribing to sensor data from Gazebo. I checked those as well, they are published at around 10 Hz with message sizes on the order of tens of kilobytes.
I tried writing a quick demo that publishes larger messages, and I can clearly see a non-trivial reduction in CPU usage. When switching the qos to transient local, I do not observe any difference between inter-process and intra-process communication. I think this also explains the results in setup B, since the costmaps topics are published with transient local.
Yes. I usually wait about 1–2 minutes for the system to reach a steady state, and then compute the average CPU utilization. When preparing the report, I also repeated each experiment multiple times, and on different days. The error bars should be approximately ±0.3 and they are conducted on an OS with a standard scheduler.
Finally, to help explain the behavior I observed in setup A and to move this discussion forward, I also collected some flamegraphs. I have uploaded the SVG files here for convenience: https://drive.google.com/drive/folders/1A75ddUFnMZlh8DB0X-_BsRWAM-ovSJ-i?usp=sharing
zenoh (Intra-process comms off)
zenoh (Intra-process comms on)
Differential flamegraphs (./diff off on)
From the flamegraph and perf top, it seems that it is related to rclcpp::experimental::IntraProcessManager::matches_any_publishers() (?)
cyclone (Intra-process comms off)
cyclone (Intra-process comms on)
Differential flamegraphs (./diff off on)
Similar to the zenoh results, this seems to point to the rclcpp::experimental::IntraProcessManager::matches_any_publishers() as well.
I hope this helps move the discussion forward. Looking forward to your thoughts!









