Unpredictable behavior in simulations (#491)

Posted by @AlessandroCapobianco:

Hi everyone!

I noticed unpredictable behavior while carrying out some simulations

In the first video there is the behavior I would like to have, that is, one of the two robots stops to wait for the other to cross the narrow passage.

In the second video there is a different behavior, i.e. the robot approaches the bottleneck but then goes back to let the other one pass.

The two simulations are exactly identical to each other (in the map and in the tasks) so I would like to understand where this difference in behavior comes from and how to correct it. Is it perhaps a planning problem in the A* algorithm?


Edited by @AlessandroCapobianco at 2024-07-10T09:31:51Z

Posted by @mxgrey:

I suspect the problem is that the cost of the plan used by A* is purely based on time, and there’s no difference in time spent by the robot if it drives forward and then retreats versus waiting in place. I’ve put some post processing into the planner to catch most of these situations, but it doesn’t work 100% of the time.

The best way to fix this will be to add in distance traveled as part of the cost function, but trying to retrofit that into the planner of rmf_traffic could be risky because there are various parts of the implementation that assume time is the only cost.

Luckily I’ve already implemented a distance + time cost function in the mapf library which will be used in the next generation RMF fleet adapter implementation. My tentative goal is to start prototyping next generation adapters in November, after we’ve finished a few other core next generation developments.

Posted by @AlessandroCapobianco:

I noticed that by placing the delta_t parameter to zero in the adjust_times function in the DifferentialDrivePlanner.cpp file this problem is solved both by using ShortestPathHeuristic and by using MinimalTravelHeuristic.

What is this due to? Does changing this parameter have any impact on the heuristics?

Posted by @mxgrey:

Ooofff, that’s playing around with internals of the planner in a very dangerous way.

The adjust_times that you’re modifying is meant to set the trajectory back to its original timing which is saved here. The trajectory incrementally moves backwards through time with every iteration of the previous loop here. We do this to check for any possible traffic conflicts that may occur if other robots may get schedule delays during runtime. It’s a brutal approach, but it works well enough until we can move to the next gen planner.

What happens when you change that to zero is it leaves the trajectory back in time, which effectively corrupts the meaning of the plan. If you’re seeing an improvement, it’s probably just a happy accident that occurs because you’ve closed that “free time” gap by telling the robot that it’s just going to plow forward by N seconds, so it doesn’t actually have any free time to move around.

This might coincidentally work out if there are only two robots, but I expect it would lead to broken plans pretty easily if there are 3 or more robots involved.


Edited by @mxgrey at 2024-07-19T13:19:09Z

Posted by @mxgrey:

If you want to try to fix the problem in a sound way, I suggest investigating the cruft map to figure out why this situation is not being detected and cut out. The purpose of that map, along with the squash function is to identify portions of the plan where the robot needlessly loops back (like what’s happening in your video) and then cut that section out of the plan entirely. If this were working correctly, we shouldn’t see the behavior that you’re reporting, so I’m guessing there’s something about your particular case that is causing the cruft map to not catch it.