Proposed changes to how ROS performs discovery of nodes

After spending more time thinking and working on this problem, I’d like to propose something that combines my previously mentioned concerns into one solution:

I suggest changing the requirement chart to look like this:

Same host Node B setting
No static peer With static peer
Off Localhost Subnet Off Localhost Subnet
Node A setting No static peer Off :x: :x: :x: :x: :x: :x:
Localhost :x: :white_check_mark: :white_check_mark: :x: :white_check_mark: :white_check_mark:
Subnet :x: :white_check_mark: :white_check_mark: :x: :white_check_mark: :white_check_mark:
With static peer Off :x: :x: :x: :x: :x: :x:
Localhost :x: :white_check_mark: :white_check_mark: :x: :white_check_mark: :white_check_mark:
Subnet :x: :white_check_mark: :white_check_mark: :x: :white_check_mark: :white_check_mark:
Different hosts Node B setting
No static peer With static peer
Off Localhost Subnet Off Localhost Subnet
Node A setting No static peer Off :x: :x: :x: :x: :x: :x:
Localhost :x: :x: :x: :x: :white_check_mark: :white_check_mark:
Subnet :x: :x: :white_check_mark: :x: :white_check_mark: :white_check_mark:
With static peer Off :x: :x: :x: :x: :x: :x:
Localhost :x: :white_check_mark: :white_check_mark: :x: :white_check_mark: :white_check_mark:
Subnet :x: :white_check_mark: :white_check_mark: :x: :white_check_mark: :white_check_mark:

In summary: Nodes with the OFF discovery range will simply not discover endpoints in other processes, no matter where those other endpoints are hosted, what the other endpoint’s configuration is, or what the static peer settings are. I have two motivations for making this suggestion:

1. Use Cases

The only use case I can think of for completely turning off automatic discovery is to run isolated unit tests within single processes. It seems very unlikely to me that anyone using the OFF setting would actually want an outside process, whether on localhost or a remote host, to discover their isolated process by simply including its host as a static peer.

I can imagine a possible use case where test nodes are running in the OFF setting and a user wants a test-observer node to be able to tap into specific individual test-runner nodes, but we’re not offering that level of granularity with the two environment variables.

Given only these two parameters of ranges and static peers, I think the most likely desired outcome is that a node with OFF does not want to be discovered at all.

2. DDS Discovery Protocol

What I know of the DDS standard is that it is a discovery-hungry protocol. It wants to discover participants, and you need to put up barriers to block that when you don’t want it to happen.

I don’t think there are standard DDS mechanisms for achieving the original requirements matrix. Instead DDS implementations would need to provide a mechanism for RMW to arbitrarily reject connections based on participant info. While this feature can certainly be implemented upstream by the DDS vendors, it might not be reasonable to demand it, especially if we’d like the feature to arrive in Iron Irwini.

Instead, the table I’ve proposed can be achieved without injecting any custom logic into the DDS discovery protocol. We can get it by toggling standard aspects of the protocol:

  • SUBNET: Usual settings, all DDS discovery mechanisms active.
  • LOCALHOST: Turn off all multicast discovery. Turn on unicast discovery for the loopback address plus unicast discovery for any static peers.
  • OFF: Turn off multicast discovery. Give the participant a unique domain tag based on its process ID. Ignore static peers.

These are all standard DDS mechanisms, so I expect (almost) all DDS vendors to be capable of supporting them.

I don’t generally think that ROS 2 should be constrained by what’s available in the DDSI specification, but combined with the use case concerns and the timeline that we want for these features, I think revising the requirements like this would make sense.

2 Likes