Hey @hidmic! Thanks for bringing it up.
So, what’s the proposal? If I understand you correctly, your theory is that rosidl is responsible for a significant part of the gap between ROS 2 and other frameworks, which seems plausible to me, but what part should we change?
Are the C/C++ data structures generated from the type definitions the problem that needs to be addressed? (e.g. rosidl/rosidl_generator_cpp/resource/msg__struct.hpp.em at 30877f52f5f19902bedb89f67bf4bafb2c6eae12 · ros2/rosidl · GitHub for C++?)
Or is it the need to do zero-copy?
Of course these things are related, but I’m trying to understand what we could concretely do to improve this. I think zero-copy between processes using the same language is possible (https://docs.ros.org/en/humble/How-To-Guides/Configure-ZeroCopy-loaned-messages.html) using loaned messages and even with the standard rosidl c++ structs if you’re using plain old data (no strings or sequences). To do this between different languages you would need something like flatbuffers. And if you want to use something like flatbuffers or capnproto or the like, then I think you could do that by adding additional rosidl generators. We have an “official” one for python that provides simple Python objects similar to the ones in ROS 1 based on the message definition, you could have additional ones for C++ and Python that present different data structures (think #include “sensor_msgs/msg_flatbuffer/image.hpp
rather than #include “sensor_msgs/msg/image.hpp
. That would allow users to use these other data structures for any user defined message type, but if you want it to be very efficient then the middleware needs to understand these new types, otherwise you’re limited to copying from your preferred user facing type to the type the middleware understands or serializing it to the wire format that the middleware understands, neither of which are particularly efficient nor do they lend themselves to zero-copy.
But even in those cases you’d still have the rosidl pipeline (type definitions → machine readable type definition → language or serialization library specific code).
It looks like you’ve avoided the need for a new rosidl_generator_flatbuffer
-like packages in your flatros2 PoC by using some reflection (a la flatros2/flatros2/include/flatros2/message.hpp at 8a8ad51ffbe363c8e4d8909b548de075d4b26ceb · Ekumen-OS/flatros2 · GitHub) and you’re using rosidl_typesupport_introspection_cpp
to handle support in the middleware, which is nice because it’s above the rosidl/rmw level mostly. However, to gain more optimizations, or for a marshaling library like protobuf or arrow (or even to use flatbuffer better), you’d probably want some build-time step, which is where a rosidl_generator_cpp_X
/rosidl_runtime_cpp_X
/rosidl_typesupport_cpp_X
like set of packages would come in. With that in mind, I guess I don’t know which parts of rosidl need to change, because it seems like, at least in theory, it should be possible to solve these performance problems.
So is the proposal just to build some of the packages I described above, or is it to change the “rosidl pipeline” somehow? Or is the conversation more about changing the defaults in some of these cases, in addition to building the alternatives in the first place?
Maybe the answer is just making what we have better? For example, I believe (someone correct me if I’m wrong) the dora-rs benchmarks are comparing against rclpy? If that’s the case then we could possibly make rclpy’s story better by having a rosidl_typesupport_XYZ_py
set of packages, so the middlewares could handle PyObject *
directly from our user’s Python code. Right now we have to convert from the PyObject *
to our C-style struct for the message before handing it to the middleware via rcl_publish()
/rcl_take*()
, which is very inefficient, especially for large data structures like images and pointclouds. Even though we’ve tried to improve the performance there using optimizations and things like numpy.
I hope so too, and there’s a REP for how to do this, it just needs resources. And I personally believe the strategy in what was proposed as REP-2011 is a “yes, and” for the idea of alternative serialization libraries, as I believe it complements features that people already use to evolve types (like optional fields), in most libraries I’ve studied at least.
Curious to see what you and others think.