FYI On message standardization (and a call for participation) . @Samahu will be discussing this proposal next Monday and having you guys (@peci1 et al) there would be great.
Here is my initial draft for the proposed message:
-
MAIN DEPENDENCY: https://github.com/ouster-lidar/lidar-msgs
-
DOWNSTREAM: https://github.com/ouster-lidar/lidarscan-msg-yolo-demo
The following PR provides ouster-ros implementation of the lidar_msgs::LidarInfo and lidar_msgs::LidarScan
- ouster-ros implementation First draft of implementing the proposal for `lidar_msgs::LidarInfo` and `lidar_msgs::LidarScan` messages by Samahu ¡ Pull Request #536 ¡ ouster-lidar/ouster-ros ¡ GitHub
This demo is a complete working example but there are still plenty of TODO(s) which Iâll try to summarize:
- Use default SI unit (meters) for the range data.
- I am considering to have LidarInfo provide per pixel angles (ScanPattern) instead of the current implementation which provides angles per row .. this will signifitcantly increase the LidarInfo message size but is more generic and can support other Lidar sensor types.
- I am yet to implement the association between LidarInfo and LidarScan messages, the current implementation doesnât handle changes to LidarInfo.
- There is a minor problem with the current
LidarScanToPointCloudmethod implementation in which it expect the LidarScan message in the staggered form. However, this contradicts with the expectation for the consumer with LidarScan 2D channels. So I need to re-work the code a bit such that the driver publish destaggered (corrected) LidarScan message that can be readily passed to 2D operator while maintaining the ability to convert the same message to a PointCloud with no artificats. - I still have work to be done to show case the dual utilization of the
LidarScanmessage for detection and representing the results as 3DPointCloud(s). The example simply plots 2D channels and the 3D PointCloud with no links. - Finally, I want to provide a workspace which includes all the required packages for those interested in contributing or reviewing the proposal and trying it out.
Future Work:
- Check if we can leverage some of the image_transport utilities to compress LidarScan message during transport.
- Address the copying when exposing
LidarScanToPointCloudmethod to the python.
Definitely agree we should probably move to something like a per pixel beam angle to support more sensor types.
Iâll note that for applications such as raytracing and several optimized methodologies for going from range image to pointclouds each pixel ends up being represented as a ârayâ with an origin and direction vector. Does it make sense to maybe represent the beam angles (and range offset) in such a form rather than angles?
This would often save work on the application side, get rid of some assumptions in the LidarScanToPointCloud method about how to apply the range offset and allow for more exotic sensors such as dome lidars.
The only downside I can think of is that itâs a pretty non-compact representation of this data for typical spinners.
This is the direction I was moving towards⌠It would simplify the LidarScanToPointCloud implementation across various sensor models and configurations.
Iâve only just read this thread but itâs a topic I find interesting.
In our work we are not using ROS but we are using Ouster and other lidar models, and have our own class that functions as described above. Itâs what we use for all our calculations internally, and it can serialise/deserialise itself.
There are a few ideas in this thread I will go away and think about, but to contribute some of my own:
- Since we use this as our internal representation within a node as well as between nodes, we make heavy use of our channels. Doing some sort of segmentation? Add a channel. Flagging points that could be dust/rain/whatever? Add a channel. Transform the points into world coordinates in 3D? Add a channel. And then just serialise whichever channels are needed to go out to the next stage.
- Regarding the inclusion of structural information, we landed on the side of just including them every time as they are pretty trivial compared to the size of the data channels, and it means any packet can be consumed on its own with no sync/context required. But this wouldnât work for all models. To break it out:
- Simple structures (like Ouster, structure can be defined by two small arrays of vertical and horizontal offsets)
- Just include it in every message, whether they change each frame or not
- Complex structures that could change every frame (e.g. Livox)
- Shape is Nx1
- Include the relevant angular information as a channel/s (and consumers know if that channel exists to use it)
- Semi-complex structures that could change every frame (e.g. raw velodyne every column has a different horizontal spacing but the vertical offset is consistent)
- Could consider having âper rowâ or âper columnâ channels rather than duplicating the data per-pixel, or resampling to a simple structure
- Semi or Complex structures that do NOT change every frame
- This is the tricky one. If you had per-pixel angles that never change, it would be nice to only store that LUT once. Donât have a good solution here.
- Simple structures (like Ouster, structure can be defined by two small arrays of vertical and horizontal offsets)
- For what itâs worth, we do the multiplier thing and store our ranges as uint16 because the loss of precision/range is acceptable in our application and the size improvement is valuable to us. It might be worth supporting both approaches?
- We may end up implementing some sort of split-by-channel-and-resync eventually to give our systems more flexibility, which changes the pros/cons of the above options (probably makes sense to just always have a latched/low-freq âpattern/structureâ channel).
As I said, weâre not using ROS so take all that with a grain of salt, but it seemed relevant and Iâm keen to see how others are approaching this problem.
Is that really true? A simple XYZI scan data point is 3x32 + 1x16 bits. Ray offset and direction is 6x32 bits, so actually larger than the scan data itself.
The LidarInfo proposed by @Samahu is the information Iâm referring to. For us a âfullâ Ouster packet for 2048x64 would be:
- float32[64] - Vertical angles
- float32[64] - Azimuth offsets
- uint16[131072] - Range data (yes itâs really 19-bit but we squeeze it in)
- uint16[131072] - Intensity (upped from 8 bit for compatibility with other lidars)
- uint16[131072] - Reflectivity
- uint16[131072] - Noise
Thatâs enough to reconstruct the full data in 3D. Which for us involves adding a PointF[131072] for data in laser coordinates and then another one for world coordinates.
If the azimuth of every column is different but static then youâre wasting space. And if itâs different every scan (e.g. Velodyne) you need to send it anyway.
Edit:
To save space we donât usually serialise the XYZ data as it can always be reconstructed from the range and other info, and would cost 6x as much per point. Only when debugging our pipeline.
This gives you the direction. But it doesnât give you the spatial offset of the origin of the ray from the sensor frame (yes, itâs a few cm, but it might be important for some applications).
Hmm, Iâm not sure exactly what you mean. Is that not something that can be extrapolated from those values (plus a couple of other individual constants?
Iâm talking about the offset called range_to_beam_origin_mm (or denoted n in the graphics) here: Sensor Data â Ouster Sensor Docs documentation .
This is a value that is a property of the particular lidar type. Yes, you can read it from datasheet. However, I thought the LidarScan message should be interpretable without looking anywhere else.
In the last call I presented an idea about an alternative implementation of the scan metadata that would allow to substantially shrink the size of the metadata messages in case there are some regularities in the metadata. Iâll try to describe it here.
There would still be the âfullâ LidarInfo message containing the beam intrinsics for each beam (pixel) (azimuth, elevation, range offset, time offset). This message is quite bulky, roughly 16 bytes per pixel or 1.6 MB for a 100k scan.
We could utilize a mechanism similar to image_transport to support multiple types of simplified lidar info messages.
As an example, letâs take a completely regular scan. It could be represented by type RegularLidarInfo:
uint32 width
uint32 height
float32 vertical_fov_min
float32 vertical_fov_max
float32 vertical_time_increment
float32 horizontal_fov_min
float32 horizontal_fov_max
float32 horizontal_time_increment
float32 range_offset
This message type would be published to topic scan/info/regular and it would be accompanied by a transport-type library that can convert RegularLidarInfo to LidarInfo.
Downstream code would then use a specialized LidarInfoSubscriber instead of a normal Subscriber, that would act like a transport node, i.e. search for plugins, subscribe the topics the plugins specify, and receive and convert the specialized types to LidarInfo.
Unlike in image_transport, I wouldnât suggest adding the other direction, i.e. translating LidarInfo to the specialized messages - it could be complicated or impossible. Iâd also suggest that the LidarInfoSubscriber actually subscribes all known topics because anyways, only one of them will have an actual publisher.
Regarding the specialized types, we should try to provide as many of them as possible in the general library. But manufacturers would be allowed to specify their own types if none of the available ones fit.
The benefits of the proposed system are:
- the transport-like part is separated from the core
LidarInfomessage, so we could first standardize the fullLidarInfoand only then concentrate on finding good specialized types - users can decide during recording time whether they prefer compatibility (record the full type) or space efficiency (record the specialized type).
- the conversion from specialized to full can be done later in batch mode
- the full message does not need to get published at any time (unless the user wants to record it). It would be internal to the node that processes the metadata.
The cons I see are:
- if manufacturers are too wild with defining their own metadata types, weâd be in a similar state as we are now, just not with the âcontentâ (ranges), but with the metadata. However, there would be a way to do a one-time conversion for bag files that will produce the full, universal type.
- a little bit more complicated for post-processing (I still suggest to use the full type for processing)
- the conversion adds latency⌠however, the converter can internally cache the messages, so if they do not change, the latency hit should be minimal
One unrelated proposal:
Letâs add a checksum field to the LidarInfo that would contain something like CRC32 of everything but header. This way, it would be super-easy to verify if the metadata changed. Also, this checksum could be used as the identifier connecting a scan and its metadata (so far, we were proposing timestamp, but maybe the checksum would be more universal and meaningful).
Another proposal:
I see the LidarInfo proposed by @Samahu as too Ouster-specific. I suggest to change LidarInfo to a more general format that could encompass even non-repetitive lidars and such:
std_msgs/Header header
uint32 width
uint32 height
float32 min_range
float32 max_range
geometry_msgs/Vector3 beam_origins # in header.frame_id coord system
geometry_msgs/Vector3 beam_directions # in header.frame_id coord system
int32[] beam_time_offsets_ns # offsets from header.stamp
int32 checksum # previous proposal
Iâm not sure about the usage of geometry_msgs/Vector3 compared to three float32[] fields. But thatâs an implementation detail which would need to be decided based on actual processing speed measurements.
And last proposal:
To better support non-repetitive scanners which may have long scan patterns, LidarScan could contain an offset into LidarInfo. That would make it possible to reuse the same LidarInfo message for mutiple scans even if they use different parts of the pattern.
However, my knowledge of non-repetitive lidars is too vague. Livox stubbornly keeps saying their pattern is non-repetitive, while both my intuition and Gemini say it has to repeat. But I have no guess at how long the period is. Gemini thinks hundreds of hours, which would render this proposal useless.
I also donât know how would such large message fit in the general ROS 2 framework. I think ROS 2 is (currently) not suitable for sending 100 MB messages or similar (just an example, I donât know the actual size). So it might be more practical to periodically send many smaller messages.
Sure, thatâs what I meant by âplus a couple of other individual constantsâ, but perhaps I wasnât clear that I was fine with those being in the message (and in every message, because itâs so small).
The part Iâm still confused about is:
As far as I can tell, for most use cases the ray offset and direction does not need to be stored per-point but the full LUT can be computed based on a few smaller arrays and constants. So itâs easier to just bundle them in the message.
I do concede that if you need those various offsets (especially the time offsets) to be different per-ray (or even per-column) but NOT per-frame then the âlatchedâ topic makes sense.
Great proposal. Reducing the transport overhead of PointCloud2 is crucial for avoiding ânetwork congestionâ that eventually poisons the entire control loop.
In my tests with ros2_kinematic_guard, Iâve noticed that when DDS struggles with high-bandwidth sensors (like your Lidar case), the first victim is often the timing consistency of /cmd_vel. Even if we move to a leaner LidarScan format, transient wireless jitter will still exist.
My project handles the âphysicalâ consequence of the issues you described: when those 2D grid packets eventually burst or arrive stale, the Guard ensures they donât turn into dangerous robot motions. Looking forward to seeing this new message typeâit would certainly make the âCommand-vs-Odomâ consistency check much more predictable by reducing RMW-level overhead.
I think this is a great proposal, I like that you are considering sensors other than spinning LiDARs. As far as I see, everything you would need for perception and odometry algorithms is present in this message, channel data, per-point timestamps, beam directions etc.
One detail I would suggest adding: Range measurement quantization interval, i.e. accuracy (im meters). Since the time-of-flight measurements are quantized integer values inside the UDP packets, this information is lost when converting them to meters. This value is vendor-specific, i.e 4mm, 1mm
Adding this information enables lossless point cloud compression, while keeping the range measurement itself as a floating point value in meters.
Another thing I would consider is whether it makes sense to try to unify spinning and non-spinning LiDAR-sensors in a single message: Many algorithms (like image-based segmentation, detection) can only work with spinning LiDARs because they need to project the data to an image, i.e. 2D array.
Maybe consider instead a separate SpinningLiDARScan message, or at least a way to indicate what kind of scan pattern the sensor uses (string field), so that the algorithm can report an error.
Again, great proposal, thanks for that.
I think something like is_dense field from PointCloud2 would vlbe enough? Something like is_regular?
I thought more about a string field indicating the scan pattern, i.e. âspinningâ, ânon_uniformâ etc. in case the message specifies an array of beam-angles/directions.
But I now see that the message also specifies âwidthâ and âheightâ fields, so it already implies image-like data.
So if you generalize this to an array of arbitrary beam directions (to support sensors like the Livox Avia), at least these two fields kind of lose their meaning.
The usual interpretation is that regular scans have height !=1 and irregular scans have height == 1.
Yes, you are right about that
Iâd just chime in to note a slight UX thing. The message structure is all well and good and would certainly be useful for the rapidly incoming wave of cheap 3D lidars, but semantically I think the name LidarScan makes no sense at all.
Lidars can be 2D, they can be 3D, they can be 1D. Thereâs nothing really telling you what this is. LaserScan, LidarScan, whatâs the difference? Given that itâs essentially just a LaserScan with another set of variables for the vertical dimension (especially the float32 array version), why not call it LaserScan3D or something a bit more descriptive along those lines?
Most systems designed to receive 2D scan input will continue to use the original version no doubt, so this would live beside those as an extra input, unless you plan to somehow displace almost two decades of backwards compatibility (yes the LaserScan message is around 18 years old by now).