Physical AI requires DNN inference for learned policies, which in turn requires accelerators. Accelerators have their own memory and compute models that need to be surfaced in ROS 2 under abstractions, similar to how tensors are surfaced in PyTorch (accelerator aware, accelerator agnostic). This abstraction would need to be available at all layers of the ROS stack (client libraries, IDL, rmw), be vendor agnostic (CUDA, ROCm, etc.), allow for runtime graphs of heterogeneous accelerators, and enable RMW implementations to handle transport of externally managed memory efficiently. Developers who implement these concepts in their packages should have CPU backwards compatibility when specified accelerators are not available at runtime.
We propose forming a working group with other vendors hosted by the ROS PMC to introduce the concept of externally managed memory and asynchronous compute that enables accelerated graphs into ROS 2 Lyrical. Tensor semantics and DNN inference standards layered on top of what is proposed here would be designed by the Physical AI SIG.
Our design sketch is a more targeted native buffer type that maps to supplied implementations in client libraries, like rclcpp::buffer. This native type only represents a memory handle for a block that could optionally be managed externally.
namespace rclcpp {
class buffer {
protected:
std::unique_ptr<BufferImplBase> impl;
std::string device_type;
};
} // namespace rclcpp
The client library interface does not expose its underlying buffer directly, but manages all access through vendored interfaces that add support for particular frameworks or hardware architectures. For example, an implementation for Torch in a hypothetical torch_support library as shown in the example below.
By doing so, buffer is a more fundamental type that is focused on data storage abstraction, while semantics like tensors or image buffers can then be layered on top of it.
# MessageWithTensor.msg
#
# a message containing only a buffer that is to be interpreted as a tensor
buffer tensor
// sample callback that receives a messages containing a
// buffer, interprets it as a tensor, performs an operation
// on it, and publishes a new message with the output, with
// all operations performed in the Torch-chosen accelerator
// backend
void topic_callback(const msg::MessageWithTensor & input_msg) {
torch::Tensor input_tensor =
torch_support::from_buffer(input_msg.tensor);
auto result = input_tensor.some_operation();
auto output_msg = msg::MessageWithTensor();
output_msg.tensor = torch_support::to_buffer(result);
publisher_.publish(output_msg);
}
A default implementation for CPU backed buffers would be provided as part of the base ROS distribution, while system vendors and framework designers would provide implementations for their respective memory types. All custom implementations would always provide support to convert to and from CPU backed buffers, such that compatibility across implementations is guaranteed.
Relevant tensor type discussion can be found in the other post here: Native rcl::tensor type