@Patrick, the short answer is that progress has been marginal.
I just reran the numbers, albeit on faster hardware (Intel i7-14700K 64G RAM). All measurements are done intra-node, so no network involved. I picked an array size of 50000 elements, and cranked up the publisher frequency until ROS1 started to drop messages due to rostopic bw running out of CPU resources.
All frequencies are in Hz and given as requested/without subscriber/with subscriber
Publisher CPU% is given with the subscriber running.
ROS1:
publisher freq: 5000/5000/5000 (CPU 67%)
subscriber: 5000 (CPU 89%)
(Note: after a short while the subscriber cannot keep up (CPU goes to 100%) and receive frequency drops to 3600, not sure what’s happening there, probably something at the OS level)
ROS2 Zenoh:
publisher: 5000/1410/1200 (CPU 106%)
subscriber: 1200 (CPU 83%)
ROS2 CycloneDDS:
publisher: 5000/61/60 (CPU 100%)
subscriber: 60 (CPU 7%)
ROS2 FastRTPS:
publisher: 5000/1450/1340 (CPU 103%)
subscriber: 400-700 (strong fluctuations) (CPU 45%)
The bottom line is that the marshaling of non-primitive data types is still slow under ROS2. Zenoh outperforms FastRTPS on the subscriber side, but it looks like fundamentally they have the same bottleneck at the publisher.
Note that ros2 topic hz
will report 10Hz and run at 100% CPU. Apparently fixing this is somewhat more involved, see this issue.