Enhancing ROS 2 Network Visibility and Attribution with eBPF Kernel Probes

In many robotic deployments, the network is treated as a trusted zone, or security is handled at the transport layer (SROS2/VPNs). However, there is often a lack of “Observability”—the ability to instantly attribute a network packet to a specific process or a specific remote identity in real-time without the overhead of standard packet capture (tcpdump).

What I’m Building: I’ve been working on BotGuard-Core, an open-source eBPF-powered sentinel. While standard tools look at the “What” (messages), BotGuard looks at the “Who” (Kernel-level attribution).

Technical Approach:

  1. Internal Instrumentation: Using eBPF Uprobes on rmw_create_node to capture node creation events the millisecond they happen, resolving them to PIDs and Binary names.

  2. External Attribution: Using eBPF TC (Traffic Control) classifiers to extract Source IP and Source MAC addresses from incoming RTPS discovery traffic. This allows for clear attribution and segmentation between host processes and external network participants.

The “Drawing Board” Discussion: During development, I analyzed how unauthorized access happens in the field:

  • Physical Hard-wiring: Even if WiFi/DHCP is secure, anyone can hard-wire a device in the field.

  • Beyond SROS2: While SROS2 provides encryption and authentication, it doesn’t solve the problem of identifying unauthorized actors who are still flooding the network or attempting to impersonate a legitimate peer.

Questions for the Community:

  • How are you currently handling live attribution of network participants in complex, multi-robot environments?

  • Do you see a value in a “Zero Trust” sentinel that operates below the middleware?

  • Would you prefer this data integrated into existing tools like ros2 node list or as a standalone security dashboard?

I’m looking for feedback from anyone working on ROS 2 security and network monitoring!

Github: GitHub - LordGan/BotGuard-Core: BotGuard-Core is an eBPF-powered security monitor designed specifically for ROS 2 ecosystems. It operates within the Linux kernel to provide "Zero Trust" visibility and security by monitoring communication at the system level. · GitHub
Connect/Follow in LinkedIn: https://www.linkedin.com/in/phesagan-ravi/

1 Like

This is a really interesting tool! I’ve been looking at network diagnostics lately in order to find bandwidth bottlenecks between the two PCs on my robot. In a large system it’s surprisingly difficult to find out which node is sending or requesting too much data and slowing down the rest of the system. I realize network metrics/benchmarking and tracing is a little different from the security perspective, but it seems the needs overlap.

What kind of CPU/latency overhead do you see with your tool? Is it something you think could run reliably on an average multi-pc system with 100-300 nodes? Would this need to be run on each computer that is running ros2?

Thank you for the feedback! You’re right - the needs of security and network diagnostics overlap significantly. Both require accurate attribution of traffic to specific actors, which is exactly what eBPF is designed for.

To answer your questions about performance and deployment:

  1. Performance Overhead: We chose eBPF specifically because it is the most lightweight method for achieving this level of visibility.

Network (TC): Our TC (Traffic Control) classifiers process packets directly in the kernel’s network path using JIT-compiled machine code. Because we don’t have to copy packet data to userspace (like tcpdump or other sniffers), the impact on CPU and latency is kept to an absolute minimum, even in high-throughput environments.

Instrumentation (Uprobes): We currently hook ‘control-plane’ events like rmw_create_node. While a Uprobe has a tiny execution cost when triggered, these are rare events (one-time per node). Once your nodes are running and exchanging data, the sentinel has zero impact on the actual frequency or latency of your topic data.

  1. Scalability (100-300 nodes): A system with 300 nodes is well within the capabilities of this architecture. eBPF state is stored in highly optimized kernel HashMaps. For 300 entries, the kernel-side memory and CPU footprint are extremely small. In our testing, the main ‘cost’ is actually the terminal redrawing of the TUI dashboard, not the kernel-level monitoring itself.

  2. Deployment:

Per-Host Attribution: To link traffic to a specific PID or Binary name, the sentinel must run on each computer. This allows us to resolve local memory addresses and process IDs that aren’t visible over the wire.

Network-Wide View: You can run it on a single ‘Sentinel’ node to see all network participants, but they will appear as identities (IP/MAC). To ‘unmask’ the specific ROS node names on each PC, those PCs would also need to be running the agent.

Bandwidth Bottlenecks: Using eBPF to track bandwidth at the node level is a very natural extension. Since we already attribute incoming/outgoing packets to a Node/PID, adding a byte counter to the kernel map would allow for real-time bandwidth profiling with much lower overhead than traditional diagnostic tools.

I’m curious—in your 300-node system, is the difficulty mainly in the overhead of existing tools, or in finding a way to attribute the ‘noisy’ packets back to the source process?