Posted by @mxgrey:
Are you noticing a problem with reporting the cancelled status? I would expect it to work fine by virtue of this line which changes the status as soon as the cancellation happens. Then we protect the cancelled status here so it should remain in effect for the rest of the task’s lifetime.
If we do what you’re suggesting we would lose the distinction between cancelled and killed status, which isn’t a big deal but I’d rather keep the distinction if we can.
Posted by @asrafhamid:
Thanks Grey for clarifying.
I suspect the issue occurs because the status is being updated to Standby (line) even after the event has already been canceled. This causes the status to get stuck in Standby instead of remaining in Canceled.
To resolve this, I suggest moving the cancellation check earlier in the on_finish block and skipping the Standby status update when the event is canceled:
if (me->_cancelled)
{
me->_finished();
return; // Skip the Standby status update
}
me->_state->update_status(rmf_task::Event::Status::Standby);
me->_publish_update();
Although this local fix may work, I hope it addresses your concerns and aligns with the intended behavior.
Posted by @mxgrey:
Ahhhh I didn’t notice that stray status change. You’re right, it shouldn’t be doing that specific status update when it’s cancelled.
I think we should still do everything else before we reach the me->_finished()
line, so I’d tweak your suggestion to this:
if (!me->_cancelled)
{
me->_state->update_status(rmf_task::Event::Status::Standby);
}
Then everything else can remain the same.
Edited by @mxgrey at 2025-04-10T07:19:44Z
Posted by @mxgrey:
I believe 22d407f should fix the status updates.
I’ve also added aa3a86a which introduces the stubborness feature we discussed, so you can set a time period that the robot will report itself to the traffic schedule as remaining in its current position. Every second this time period will be refreshed.
I believe with those two changes we’ve addressed all the concerns that we discussed. Let me know if I’ve forgotten anything or if you encounter a new issue.
Edited by @mxgrey at 2025-04-15T08:18:24Z
Posted by @asrafhamid:
sorry, is there an updated version of DynamicEvent.action? Seems like its missing a stubborn_period field
Posted by @asrafhamid:
Thanks Grey for the status fix and stubbornness feature, will get to testing asap.
On a separate note, I noticed that out of the 3 events (next, cancel, end). Only the ‘next’ event returns as goal succeed but not for the rest. Curious if this is an intended behavior since it seems more consistent if all events return succeed (with meaningful result) when they successfully complete their respective actions.
Posted by @mxgrey:
Only the ‘next’ event returns as goal succeed but not for the rest. Curious if this is an intended behavior
Whoops, that’s definitely not intentional! I’ll fix that up right now.
Posted by @mxgrey:
End Event goal response : Segmentation Fault, crashes fleet adapter
Whoops, that was a copy/paste error. Should be fixed by this: 6d69c47
despite adding a stubborn_period (10s - 60s) to a next event, no change in behaviour upon reaching destination. Currently tinkering to see how it works
That’s surprising.. be sure to update the rmf_internal_msgs
branch that you’re using, if you haven’t done that already.
I opened this PR that offers a --stubbornness
argument so that the feature can be tested through the command line. When you assign some value for --stubbornness
you should see the robot maintain its position on the traffic schedule after arriving at its destination or even after its go_to_place
gets cancelled. If you’re looking at rviz that will show a large yellow dot around the robot’s purple dot.
I would suggest trying out the rmf_demos_tasks
script on your machine and see if the stubbornness works through that. If it does, then I would guess something is going wrong in the message translation.
Posted by @asrafhamid:
apr_21_issue.zip
Regarding the issue observed on Monday (April 21), I’ve attached the fleet adapter log and ROS bag for reference.
Summary (office world):
- Send tinyRobot1 to patrol_D1 (stubborn_period:20) till reach
- Send tinyRobot2 to patrol_A1 (stubborn_period:30)
- tinyRobot2 will get blocked by tinyRobot1
- after 30s, tinyRobot2 will attempt to replan (it might rotate in place)
- tinyRobot2’s yellow marker will cross over tinyRobot1 till it reaches patrol_A1
Relevant Log Excerpt (Replanning Attempt):
[INFO] [1745551317.451841260] [tinyRobot_fleet_adapter]: [tinyRobot/tinyRobot2] waiting for traffic
[INFO] [1745551347.452872743] [tinyRobot_fleet_adapter]: Replanning for [tinyRobot/tinyRobot2] because a traffic dependency is excessively delayed
Posted by @mxgrey:
Thanks for documenting this problem. I did manage to recreate it on my own machine.
This was quite a vexing problem to unravel because I couldn’t find any logical reason that one robot would be failing to update its schedule but not the other, especially since both are managed by the same fleet adapter.
Ultimately what I discovered was the rclcpp timer for updating the schedule delay stopped firing for one of the robots even though it should have kept firing every second.
I created this debug_vanishing_timer
branch to narrow the problem down to this. If you run with that branch you’ll see
[fleet_adapter-16] timer: 0x7fb0901a90c0 is_cancelled: 0 is_ready: 0 time_until_trigger: 0.775857
[fleet_adapter-16] timer: 0x7fb0901a90c0 is_cancelled: 0 is_ready: 1 time_until_trigger: -0.223956
[fleet_adapter-16] timer: 0x7fb0901a90c0 is_cancelled: 0 is_ready: 1 time_until_trigger: -1.22419
[fleet_adapter-16] timer: 0x7fb0901a90c0 is_cancelled: 0 is_ready: 1 time_until_trigger: -2.22389
[fleet_adapter-16] timer: 0x7fb0901a90c0 is_cancelled: 0 is_ready: 1 time_until_trigger: -3.22397
[fleet_adapter-16] timer: 0x7fb0901a90c0 is_cancelled: 0 is_ready: 1 time_until_trigger: -4.22382
[fleet_adapter-16] timer: 0x7fb0901a90c0 is_cancelled: 0 is_ready: 1 time_until_trigger: -5.224
[fleet_adapter-16] timer: 0x7fb0901a90c0 is_cancelled: 0 is_ready: 1 time_until_trigger: -6.22379
[fleet_adapter-16] timer: 0x7fb0901a90c0 is_cancelled: 0 is_ready: 1 time_until_trigger: -7.22402
[fleet_adapter-16] timer: 0x7fb0901a90c0 is_cancelled: 0 is_ready: 1 time_until_trigger: -8.2239
[fleet_adapter-16] timer: 0x7fb0901a90c0 is_cancelled: 0 is_ready: 1 time_until_trigger: -9.2239
and the time_until_trigger
just keeps getting more and more negative forever, implying that rclcpp should know that it hasn’t fired it in a long time despite it being ready.
So the problem resides somewhere in rclcpp, and I don’t think there’s anything that can be done in Open-RMF to resolve this.
The good news is, I just tested this scenario on the latest rolling
version of ROS 2 and the problem went away. Presumably whatever was causing this bug has been fixed in the latest rclcpp.
If you’re interested in testing this out with rolling, you’ll need the recreate_vel_cmds
branch of rmf_simulation in order to be compatible with the latest gz-sim.
Edited by @mxgrey at 2025-04-25T12:57:12Z