Posted by @jkeshav-bvignesh:
Hi Team,
In continuation with discussion https://github.com/open-rmf/rmf/discussions/147, I had been observing some strange behaviors in the RMF Traffic module. When multiple robots are instructed to go to the same target destination at the same time, RMF often manages to successfully negotiate the conflicts that occur and complete all the allocated tasks. Robots do pause or slow down and move in and out of the target destination.
But occasionally, these negotiations don’t happen as expected. Below is one such scenario where 6 robots (5 delivery robots and 1 tiny robot) were tasked to go to the same waypoint using go_to_place
from task_api_requests
. RMF has been able to successfully handle this exact scenario multiple times in simulation. This video showcases a series of failures that occured during one particular run. But I am not sure if these failures are related or not.
(The video has been sped up to fit the size restrictions. Please decrease the playback speed, if required)
The major issues that can be observed are:
- The Green/Yellow marker (Expected location) and Purple marker (Actual robot location) go out of sync quite early
- Two delivery bots collide with each other. Not sure if this due to the marker mismatch
- The Green/Yellow markers completely go out of sync and disappear towards the end of the video. (I believe the full control node dies at this point)
- One robot can be observed to move out of the grid while navigating (at 00:06)
While this is quite rare in simulation, we often observe marker mismatches when using real robots. @dennis-thevara’s discussion https://github.com/open-rmf/rmf/discussions/159 is related to that. It could still be the case that these are two different issues altogether. But coordination between robots ends up failing in both cases. The simulation only uses the components that are part of the RMF Core in a custom environment.
Based on this,
- Is our understanding regarding the Markers correct or do they also signify something else?
- Is this a cascading failure scenario or are they unrelated?
- What could be causing this failure?
On a related note, what does a MirrorManager
do?
I also often get these warnings from various nodes. Is it related to this?
1655453323.0849385 [full_control-19] [WARN] [1655453323.084637004] [cleanerBotA_fleet_adapter]: Failed to update using patch for DB version 1538; requesting new update
1655453323.0855885 [full_control-15] [WARN] [1655453323.084657292] [tinyRobot_fleet_adapter]: Failed to update using patch for DB version 1538; requesting new update
1655453323.0860090 [schedule_visualizer-5] [WARN] [1655453323.084798741] [rmf_visualization_schedule_data_node]: Failed to update using patch for DB version 1538; requesting new update
1655453323.0863543 [rmf_traffic_schedule_monitor-2] [WARN] [1655453323.084986126] [rmf_traffic_schedule_backup]: Failed to update using patch for DB version 1538; requesting new update
RMF was build from source using the latest packages pulled on May 25th using vcs import.
Here is the associated log file: RMFTrafficFailure.log
Chosen answer
Answer chosen by @jkeshav-bvignesh at 2022-06-24T06:48:45Z.
Answered by @mxgrey:
As of right now, RMF is not expected to be able to correctly or successfully negotiate traffic for multiple robots that want to reach the same destination at the same time. When multiple robots want to reach the same destination simultaneously, it is no longer a “traffic” issue so much as a logical inconsistency issue because the overall desired end state for the traffic planning problem is invalid. Sometimes the negotiation system does manage to sort things out, but that is mostly a matter of luck and should not be expected or relied on.
Of course there are bound to be cases where the immediate goal of multiple robots requires them to reach the same destination as each other, for example if they need to use the same lift or they need to pick up an item at the same pick up point. In our view, this is not a traffic problem as much as it is a resource allocation problem where the “resource” is the right to access the physical location.
We would like to develop a reservation system that ensures only one robot at a time will try to approach any location on the map as its goal. An issue ticket was opened for this here some time ago. While that ticket only talks about reserving parking spots, we would use the same reservation system to delegate the order that robots have access to a location. We have a slide on the idea here, and an adjacent idea for a queuing system here.
In the general case, every time a robot has a destination that it is trying to reach…
- It will send a request to the reservation system asking for the right to go to the location.
- The reservation system will check if any other agents have that location reserved (or will have it reserved by the time the requester reaches it) and then let the requester know if it is available, or provide an estimate of when it should be available if it is not currently.
- If the location is not yet available, the robot would place an additional request for a nearby parking spot from the reservation system. This request will include a list of parking spots that the robot could use, with each item in the list ranked according to preferability (e.g. how close the parking spot is to the real destination).
- Once a parking spot is assigned, the robot will move to that parking spot to wait until its real destination is available. Its request to reserve its real destination remains in the memory of the reservation system, queued up alongside requests from other robots to use that same space.
- When the space is available for the robot, the reservation system will issue a signal to indicate that it is now reserved for the robot.
- The robot moves from its parking spot (or wherever it happens to be) towards its real destination. At the same time it releases the parking spot that it had reserved for the sake of waiting.
For specific cases where a system integrator can anticipate a bottleneck where many robots will want to access the same resource at the same time (e.g. a door, lift, workcell) the system integrator can define a queuing area for the robots to wait in for that specific resource. The above workflow would change at item (3) and the robot would reserve a spot in the queuing area instead of reserving a generic parking spot. If the queuing area is full then the robot would fall back on the general strategy of reserving any nearby parking spot.
While we’ve put a lot of thought into the design of these systems, they are not currently being funded, so unfortunately they are not being actively developed at the moment.