A proposal for a LidarScan sensor message

Hello ROS community

When working with Lidar data users are usually referred to using PointCloud2 objects that represent the lidar data as a list of 3d points with additional attributes. While this nicely mirrors the PCL representation and fits the majority of applications working with 3D point cloud data this isn’t how modern lidar sensors represent data.

Problem Statement

This has several drawbacks that are highlighted in the following list (not a comprehensive list):

  • With the rapid increase in Lidar resolution, PointCloud2 can be hefty to transport..To this date many DDS implementations struggle to catch up with the actual sensor frame rate when transporting a high resolution PointCloud2 on low to medium compute nodes.

  • An option to reduce the bandwidth requirement would be to use dense pointclouds i.e. transport only valid points. However, by doing so we lose the structured nature of Lidar data of devices that natively generated the Lidar data in a structured 2D grid.

    • Many image processing operations do benefit from the adjacency information allowing for quick lookup of neighboring pixels. For example, Ground Plane Removal can be more efficient to implement directly on 2D range data than using a 3D representation.

    • One could also directly employ existing 2D neural networks like YOLO on Lidar data in their 2D representation.

  • A potential critique to this suggestion might be that we don’t need a new message to do this as we already have sensor_msgs::Image that can fulfill this aspect for users who need it to be that way. And in fact, the ouster_ros driver -optionally- does publish the range data and other byproducts of the sensor as sensor_msgs::Image on separate topics. I am aware of many users who do indeed utilize these topics instead of the 3d pointcloud data.

    • While this works fine if you are only interested in processing each channel individually. This breaks down a bit if you need to access and use more than one channel in the same operation - which is often the case -.

      • A simple example of this, a user may want to filter certain returns (range data) based on reflectivity values and adjacency data simultaneously.
    • A common approach to this problem in ROS would be to use the `ApproximateTime` filter. Doing so, however, adds some latency and CPU overhead to synchronize data channels that were originally already synchronized.

    • A LidarScan message acts here as a multi-spectral image with all channels are memory aligned with 100% data correlation ensured by the sensor without software sync overhead.

The proposal

We are proposing the addition of a new ROS sensor message that mirror that native format of the majority of Lidar sensors (whether spinning or solid-state), in this proposal we would like to invite other lidar vendors to also contribute to make sure that this format encompasses the entire spectrum of Lidar sensors.

A quick draft of a LidarScan message could look like this:

std_msgs/Header header

# Dimensions of the scan (e.g., 128 channels x 2048 columns)
uint32 height
uint32 width

# --- Geometry Metadata ---
# Horizontal and Vertical FOV/Resolution info to allow projection to 3D 
# without needing a full PointCloud2 blob.
float32 vertical_fov_min
float32 vertical_fov_max
float32 horizontal_fov_min
float32 horizontal_fov_max

# --- Channel Data (The "Image" approach) ---
# Each channel (Range, Intensity, Reflectivity, etc.) is stored in this list.
# This mirrors the 'PointField' logic but at a 2d-grid level.
LidarChannel[] channels

# The actual raw buffer containing all interleaved or planar channel data.
# Using uint8[] allows for Zero-Copy compatibility.
uint8[] data

# --- Scaling and Metrics ---
# Different venders have different units
# Ouster (mm) vs. Velodyne (m) vs. Hesai (cm) problem.
# Range = (raw_value * multiplier) + offset
float64 range_multiplier
float64 range_offset

And the definition of LidarChannel:

string name        # "range", "intensity", "reflectivity", "ambient", "near_ir"
uint32 offset      # Offset from start of data row
uint8  datatype    # uint8, uint16, uint32, float32, etc.

While this works for sensors with uniform distributions of laser beams not all vendors have that formation including Ouster sensors making the section on Geometry Metadata insufficient:

float32 vertical_fov_min
float32 vertical_fov_max
float32 horizontal_fov_min
float32 horizontal_fov_max

Ouster spinning sensors vertical beams don’t have uniform distribution due to the calibration process. Which means we need to extend the previous definition to include the beam angels in the LidarScan message body:

# --- Non-Uniform Geometry Metadata ---
# These arrays allow the receiver to project Range -> 3D.
# vertical_angles[height]: The elevation angle for each ring (in radians).
float32[] vertical_angles

# other attributes might also be needed
# horizontal_angles[width]: [optional] The azimuth angle for each column (in radians).
# int32[] beam_time_offset: [optional] To handle "staggered" firing patterns within a single column.

This solves the problem and allows users to project the range data into 3D but adds a bit of overhead increasing the message size. These arrays basically essentially define the intrinsics of the Lidar sensor, however, transporting them with every LidarScan message reduces or eliminates most of the gains attained by transporting the raw range data vs using the projected xyz points. A better approach would be to break down the beam information and the lidar data into two separate messages in which the lidar sensor info is only transported once earlier during the connection phase. This is not a new pattern to ROS as it already has `sensor_msgs/CameraInfo` which describes the intrinsics of the camera link: sensor_msgs/CameraInfo Documentation .

By moving these intrinsic values fields into a separate message we can retain the same gains and keep the LidarScan message lean. The definition for a sensor_msgs::LidarInfo message would be something like:

std_msgs/Header header

float32[] vertical_angles
float32[] horizontal_angles
int32[] beam_time_offsets

# --- Scaling and Metrics ---
float64 range_multiplier
float64 range_offset

# Plus other static factory data (intrinsic/extrinsic)

And the revised LidarScan message becomes:

std_msgs/Header header
uint32 height
uint32 width
LidarChannel[] channels
uint8[] data

NOTES:

  • This format is more suited for filtering and perception stacks

  • It is important to note that this proposal does not suggest that vendors of Lidar sensors or users should stop using the PointCloud2. It mainly suggests the addition of a new message type that mirrors the native format of the majority of Lidar sensors, reducing overhead and providing better synchrony.

  • The idea here is to come up with a standard sensor_msgs::LidarScan and sensor_msgs::LidarInfo messages .. and totally abstract out the process that converts this native Lidar sensor message format and produce the 3D pointcloud out of any sensor.

  • Once we get initial feedback from the community the idea is that Ouster and others who are interested in this concept to build a PoC of the proposal and make sure we cover the all basic necessities for this to work before committing to the final interface.

  • I am also aware of the other proposals around Native Buffers (rcl::Buffer) that are already in flight and we plan to support this from the get go as there are large intersection with the motivation behind Native Buffers and the use of LidarScan for perception type of tasks and other workloads

4 Likes

I very much agree with the need you described! I even did some work on a very similar proposal several years ago: MultiLayerLaserScan · Issue #150 · ros/common_msgs · GitHub (proposal), GitHub - peci1/multilayer_laser_scan · GitHub (implementation).

My approach was similar in many ways, but it has some different ideas. Please, have a look at it :slight_smile: (most importantly, it reuses the existing PointField types and it offers simple ways to describe regular scans).

The side-channel with metadata/scan layout is also something I wanted to have. However, there is a difference to CameraInfo: CameraInfo should be sent with every message. That’s exactly what you’d want to avoid with lidars. So it’s not true that this approach has a precedent in ROS. The closest to it would be static TFs, but it’s not really a good analogy. The problem with sending the metadata via a side-channel on a latched topic is that you never know if you have valid and up-to-date metadata - without waiting for an undeterminate amount of time. That was one of the objections my earlier proposal had. Thinking about it now, I think there might be a solution - e.g. sending a hash or UUID of the scan layout in the scan message. But I don’t have the details of this approach thought through and it would be a novel concept in ROS.

Apart from that, I’m strongly against the multiplier. Just convert everything to meters as per REP 103.

3 Likes

To offer some counter arguments:

  • pointclouds are getting big to transport: use a shared memory RMW. For external subscribers use cloudini

  • adjacency information: doesn’t structured PointCloud2 messages already solve this problem? If you have a width and height you can calculate adjacent points

  • scaling: ROS2 encourages using SI units, so it discourages any scaling factor

For “raw” sensor messages your format is probably suited better, but for for example a SLAM package, you’d want the projected points in 3D.

Not really. If you look at the ray offsets in Ouster lidars, ray R in column C is not adjacent to ray R+1 in column C. The angular difference between them is ~5-10 degrees. But yeah, you could add another field to the pointcloud with the firing angles. But I haven’t seen any lidar that would do so. It would probably make the pointclouds unnecessarily large.

I think the general idea @Samahu and I have is to have something like LaserScan+PointCloud2, but for 3D lidars. You know that with 2D lidars, all the vendors sticked to publishing the standardized LaserScan message, so that nobody even knows these lidars have some internal data format. Nobody had to ever see it, because there’s LaserScan and laser_geometry package.

Thanks @peci1 for the feedback .. I think I came across the MultiLayerLaserScan some time ago (probably through one of your github issues) but I have forgotten about it, there are probably some good lessons learn from there .. so will take a closer look.

With regards to the use of SI units for range value (also noted by @Rayman) .. that is definitely worth considerations .. The only reason why I opted for the raw mm measurements was for the two reasons

  • precision loss: I was worried that the round trip (uint32 millimeters → float32 meters → uint32) would be a lossy round trip .. however, after doing a tiny research we only to be concerned about this when having 16 km+ represented as uint32 millimeters, then the round trip between mm to meters would have loss in precision. I don’t see Lidars crossing this threshold anytime soon :face_with_raised_eyebrow: .. so we can ignore this factor.

  • The other factor I had in mind is that integer math can run faster .. that varies per operation but generally speaking if your operation on range values can be done as uint32 it typically runs faster than than using floating point math.

That being said if the standard requires using SI unit then we should follow it.

pointclouds are getting big to transport: use a shared memory RMW. For external subscribers use cloudini

I would like to support cloudini at some point but I expect using it would add significant delay during encoding and decoding.

Many image processing operations expect that the measurements are memory aligned in the buffer (or for example using YOLO to a simple inference), that’s not the case with PointCloud2, you have to skip other measurements when iterating over the PointCloud2 points array. You can re-arrange it such that it is memory aligned but then that’s an additional step you have to carry out.

I think the general idea @Samahu and I have is to have something like LaserScan+PointCloud2, but for 3D lidars.

Yeah, I am not suggesting that Lidar sensor driver should omit the use of PointCloud2 but rather give the option to publish either or both. I will update the original post to highlight this fact.

We are just giving the option for user who don’t need the 3D pointcloud (at least not right away) and possibly allows user more flexibility to pick and choose what fields they want there through the LidarScanToPointCloud method.

I agree with your assessment, Ouster sensor could change configuration at runtime, not having the right sensor config when decoding the PointCloud object would be problematic.. We could definitely some form of an id to the LidarScan message to associate it with a specific LidarInfo message. I will update the proposal (and highlight the necessity of this change).

One of the nice examples when LidarScan can do more than PointCloud2 is raytracing. PointCloud2 doesn’t carry all the necessary information to do it (as a workaround, you could add viewpoint fields, then it would be sufficient; but again, I haven’t seen any lidars that would do that).

Other examples of what PointCloud2 in general cannot do is telling you the directions of rays that did not hit anything (again, this could be worked around by adding viewpoint and viewing direction fields).

That is true. by the way @peci1 you may want to checkout my post Projecting a 3D Point into a 2D Range Image :: Ouster Community I know you had some work done on this front but worth double checking if it aligns with your findings. The problem isn’t as trivial as many implementations that I found circulating online when I have worked on it.

1 Like

A few quick thoughts on this. Overall I think that there’s now enough devices in this space that we can likely improve things by iterating on the standard messages. The point cloud representation is more generic but is definitely less compact representation.

I’d like to suggest one other analogy that would be good to compare with. When you step back and look at it a little bit a structured set of depth data like this is actually very much like a depth image: REP 118 -- Depth Images (ROS.org) as a 2D array of depth values.

There’s clearly a few more elements related to the non-uniform angular differentials. Potentially reusing part or all of that infrastructure might make sense. And maybe a parallel to CameraInfo or an augmentation might be valuable.

Other benefits we’ve seen of using the Image standard is that you can do things like send it through a lossless compression such as PNG which is quite effective at compressing things in a way that’s fully recoverable, and there’s compressed_image_transport options as well as many front end libraries that can manipulate and render them really fast. We did this way back in the day for the PR2 tilting laser quite effectively.

Similar to your example a quick search of examples converting cloud to image:

But if we’re going from structured data into the image it should even be more effective.

Re: Metric

Please do stick to metric representation. It is potentially slightly more performant to use the integer math, but until that’s been shown to be a bottleneck I’d suggest that it’s premature optimization. And that the generality/introspection etc is much more valuable. Also floating point acceleration in the compiler is pretty good these days.

3 Likes

Thanks @tfoote , I don’t think I have reviewed the REP on the depth images before. Will take a look and see how much it correlates with what I am proposing here.

One problem with image representation is that all image formats I know use the same bit depth for each channel. But that’s not the case in general for lidar scans.

Also, you might want to mix integer and floating point channels, and I don’t know any image format that could do that.

But in some simple cases like XYZ-only clouds, this kind of works. It is definitely good for visualization and getting a quick overview of the scan.

1 Like

Easy, just make another TIFF extension!

2 Likes

An option is using a timestamp in the LidarScan message that refers to the corresponding timestamp in the header of the “LidarInfo” message. That avoids needing to create some kind of new identifier besides ones that already will exist and is close to existing methodologies.

2 Likes

Thanks for the idea, but timestamps arr insufficient. You need something that will tell you “I don’t need to wait for another LidarInfo message because the last one I have fits the scan I’m now processing”. Timestamps can do this only in case you’re publishing the info message alongside each data message, which is exactly what we want to avoid (because the info message can be large-ish).

Eventually, I can imagine a triple topic system: data, latched sparse info and then a “mapping” topic that would just confirm which timestamp of data and info are meant together.

Perhaps I was unclear. For example the LidarInfo message would be like so:

Header header
# other data here

And the LidarScan message would be like so:

Header header
time lidar_info_time # refers to the timestamp of the time in the header of the LidarInfo message
# other data here

Then if you get a LidarScan message you just check if you have received a LidarInfo message with a matching timestamp to lidar_info_time, if not you wait until you do.

Nothing ties them to being published at the same rate.

2 Likes

Thanks for the clarification. Now it fully makes sense.

great proposal @Samahu

1 Like

An option is using a timestamp in the LidarScan message that refers to the corresponding timestamp in the header of the “LidarInfo” message. That avoids needing to create some kind of new identifier besides ones that already will exist and is close to existing methodologies

Thanks @mattb, this make sense, we can skip having to generate a UUID and simply use the timestamp of the relevant LidarInfo message.

1 Like