Generalized Task Composition (#169)

Posted by @mxgrey:

With the introduction of Flexible task definitions by mxgrey · Pull Request #39 · open-rmf/rmf_task · GitHub which followed this discussion, RMF allows users to compose tasks that can be nested sequences of activities. This added a great deal of customizability to RMF’s task system, but we recognize that real world needs will not always fit into this narrow task description structure.

While the last revision of the task system was a big step forward, it was never meant to be the final design. Its limitations were known at the time that it was designed, and those limitations were strategically chosen to allow us to cover the most broadly critical needs in the most reasonable amount of time.

This discussion will try to flesh out a design for the next revision of RMF’s task system. My hope is to design a task framework that’s general enough that we will not need another major revision after this (but feature additions and implementation improvements are always expected). Specifically, I’d like to address these requirements which were intentionally excluded from that last revision:

  • Tasks are defined by event graphs (similar to a behavior diagram/tree) that users can draw in a GUI
  • Tasks can have loops, branches, and other forms of conditional logic
  • Task descriptions can involve multiple robots operating simultaneously
  • Task descriptions can involve one robot/device performing multiple actions at once where possible+relevant
  • Multiple nodes/pathways in the event graph can be active simultaneously
  • Event graphs are robust to how users try to design them, e.g. fallback behavior is always reasonable and tasks can never be permanently locked up due to the graph being badly designed

I’ll start with defining some of the elements that I think would be useful for an event graph framework. The following picture was exported from this Figma project:

Additionally we’ll want a way to encapsulate these graphs for the sake of reusability, hierarchy (nesting event graphs inside of event graphs), and clean design. We’ll refer to an encapsulated event graph as an “Activity”, borrowing terminology from the UML standard. Since we want to be able to nest Activities inside of Activities, an Activity can declare the Triggers, Mutators, Queries, Properties, and Events that it exposes to any parent Activity that wants to use it, much like a public API.

Here is example of what a Delivery Activity Diagram might look like. In this diagram, the pick up and drop off behaviors will be retried up to 3 times each if they fail for some reason.

Here is an example of what a multi-robot “Vacuum and then Mop” activity could look like. Additionally it has queries to decide to cancel everything if a deadline is reached.

Building on the previous example, here is an example of a task that runs a list of sub-tasks simultaneously using a for_each block. The for_each block manages multiple instances of a single event graph simultaneously and takes care of synchronizing its inner event graph instances with the parent graph.

Note that I’ve embedded comments in all of the examples. I won’t repeat the content of the comments here, so I recommend reading through those inside Figma itself for a deeper understanding of the ideas.

These examples should not be taken as a finalized design, but rather as the starting point for a design discussion. I believe there will be many many details to flesh out over the coming months, and these drawings are just meant to establish a base line and reference point for discussion.

Besides discussing the ideas that I’ve laid out so far, here’s what I would recommend for next steps:

1. Decide the data types that are supported for messages

For the data types, I would like to take inspiration from Rust and support the following types:

  • primitives: u64, i64, f64 (should we bother supporting lower precision primitives like u8, i8, u32, i32?), bool, string
  • struct: Heterogeneous collection of data with named fields
  • tuple: Heterogeneous collection of data with unnamed fields
  • List<T>: Homogeneous collection of data with unnamed elements
  • variant: Data which can belong to one of a predefined set of possible data types. Pattern matching and dispatching blocks are used to identify which data type is contained at runtime.
  • Option<T>: Specialized variant that can contain Some(T) or None
  • Result<T, E>: Specialized variant that can contain Ok(T) or Err(E). Recommended output for a finished event.
  • Property<T>: Reference to some Property of type T
  • resource: A reference to some resource that can be used by an activity, such as a mobile robot, a planner, a workcell, etc. Resources have “traits” which define methods (sync) and services (async) that activities can call on the resource. The traits that an activity requires for a resource is baked into the type information of the resource declaration. E.g. if Mobile is a trait for objects that can be commanded to move, then MobileAgent: Mobile is a resource type declaration for a resource with the Mobile trait.
  • callback: An object which can be called as if it’s a function

2. Determine what block types should be supported (and how to support custom third-party blocks)

In the examples that I’ve provided I’ve defined these block types:

  • Trigger blocks: Exposes a trigger (e.g. start) to parent event graphs
  • Query blocks: Exposes queries to parent event graphs
  • Property blocks: Exposes properties to parent event graphs
  • Finish event block: Determines the finishing point of the event graph
  • Activity block: Allows an activity to be triggered
  • map: Maps input data (e.g. a trigger and/or queries and/or listeners) into output data (e.g. event or property) via a template substitution
  • cmp: Takes in a tuple with two values of comparable types (e.g. integer or float) and compares their values, providing a less, equal, and greater branch.
  • dispatch: Takes in a variant and outputs an event/property for each alternative type within the variant
  • iterate: Takes in a trigger and a limit. Emits continue until the limit is reached, then emits break.
  • if_match: Triggers then if every input matches their specified patterns. The message for then can use data from the patterns in its data mapping. If any input does not match its pattern, else will be triggered. The message for else can only use the raw input types.
  • for_each: Takes in a List<T> trigger, queries, and a event graph frame. For each T in the list runs its inner event graph by sending T to the inner event graph’s trigger block.
  • signal block: Triggers a signal for this event graph
  • signal handler block: Intercepts signals that are emitted for this event graph

3. Serialization format for all of the above

We’ll need a way to serialize all the event graph data so that it can be transmitted and saved to disc.

While JSON isn’t exactly my favorite serialization format, it does have the benefit of extremely widespread support across the web and virtually all programming languages. It also allows us to use json-schema to describe and validate the structure of the serialization.

4. GUI for drawing the event graphs

For this concept to be useful, a user-friendly GUI is critical. The best possible solution for this would be one that we could use in both a web browser and in the “traffic editor” tool, which would imply implementing it in Rust, which can compile to WASM.

However, I’ve had a difficult time finding Rust tools/frameworks to help with drawing graphs. In general the Rust ecosystem is still somewhat nascent when it comes to GUI development. Meanwhile there are many tools in the JavaScript ecosystem suitable for drawing graphs. To prevent requirement this from being a blocker for too long we could consider implementing a first version of the event graph drawer in JavaScript using an existing framework while we have a parallel effort to implement a final version in Rust, most likely using Bevy.

With some guidance from the Bevy community, I’ve learned about egui_node_graph which looks like it may be suitable for our initial needs. We should try out that library as a first step, and maybe iterate on it if it has any gaps that we can reasonably fill in.

5. Implementation of event graph execution

Since RMF is migrating to Rust for the sake of code quality and robustness, we’ll need to implement the event graph execution framework in Rust. I expect this is going to be the most challenging undertaking and will require its own deeper technical discussion.

6. Modeling of event graphs for task planners

In order for the task bidding system to work, fleet adapters need to be able to reason about tasks based on models. Since event graphs may involve run time variables that cannot be predicted by a model, we may need a way to embed hints inside of the event graphs to indicate the likely paths. For example, adding some decoration to the graph to indicate which branch is likely to be chosen at a split or how many times an iterator is likely to repeat.

Other topics for discussion

Artwork and UI design

I am not an artist, so it would be great to get feedback and suggestions on the UI elements in the example drawings. Feedback and suggestions may include completely redesigning everything to better fit the 21st century.

Terminology / Vocabulary / Soundness

If there are better words to use to describe any of the concepts that I’m using, please do share. Also please bring up any important concepts or requirements that I may be overlooking.

Also if anyone knows of relevant literature from graph theory and/or event-driven architecture research, please let us know so we can get the most out of existing research in the field.


Edited by @mxgrey at 2022-07-15T05:05:23Z

Posted by @methylDragon:

I’m not sure if I know what I’m talking about, but would introducing a concept like reference-counting or worker pools be useful? Specifically in the context of tasks/functions that can run concurrently, but have limited slots.

E.g. A robot has 3 arms that can do some arm related task <ArmTask>. But there are 4 <ArmTasks> to be done (so it can only handle 3 <ArmTasks> concurrently.) Or a robot that can do task <A> and <B> concurrently, but not multiple <A> tasks.

In those cases the system could wait for all the tasks to be done (how would this be described?), or move on to the next step as each task is done (ala the for_each block’s finish element)

I understand that the current system is sufficient to describe such a scenario (by listing the appropriate properties and mutating that property to help with reference-counting), but this seems like a common enough occurrence that it might be useful to add a block-type just for it?


Edited by @methylDragon at 2022-07-21T08:02:25Z

Posted by @mxgrey:

I completely agree, and I was thinking about this as well. The for_each block was a first attempt at having this functionality, but I admit that it would be better to have a solution that allows jobs to stream in gradually instead of all at once.

Here’s a quick sketch that I just made of what I can imagine a pool block looking like. Let me know if that covers what you have in mind or whether you feel that there’s something missing or something that can be improved.

Posted by @methylDragon:

It’s about there, though I think it’ll need a num_workers property?

Posted by @methylDragon:

I found a couple of resources that look like they’d be relevant? It’s not exactly event-graphs governing stateful activities, but state machines are pretty close (even if they just model the system’s state as opposed to the state of individual activities.)


The rest of my notes here will be a little bit more tentative, since they’re just initial thoughts and I’m not too familiar with this domain.

I was reading the post again, and was wondering if you were thinking of microservice architectures when you mentioned “event-driven architecture” (EDA)? If that’s the case, then the event-graph devised so far could be interpreted as a series of stateful microservices, connected by events.

In which case the following posts might be useful:

When I was reading those resources, I realized that events in the microservice architecture encapsulate state (that is: an event producer ensures that everything that a prospective consumer needs to process the event is captured in the event.) Doing it via the events is an alternative that could be better than having consumers query the state of a producer, and producers directly mutating the state of a consumer, I think (there’d be less back and forth, albeit at the cost of increased event message size?)


Edited by @methylDragon at 2022-07-27T20:03:56Z

Posted by @arjo129:

I’m just leaving notes here but, it might be worth exploring execution and deadlock prevention in the system. Given the type of DSL we are building and the fact that realistic robot deployments will execute slower than OS processes and we likely know the total amount of resources being spent there are numerous algorithms we could apply before hand to ensure that the system does not go into a deadlock.

Posted by @DDDWY:

How is the progress of this project?

Posted by @mxgrey:

There’s been significant progress made on the foundational tools for this, most of it taking place in the bevy_impulse repo (documentation pending). A proof of concept for executing directed acyclic graphs is finished, and I’m close to finishing a major rework which will support generalized workflow diagrams, much like this discussion originally described, although more streamlined.

The other pieces that are currently in flight include

  • rclrs which will be needed to communicate between Rust-implemented nodes
  • site editor which will allow users to create more expressive robot behaviors, and will likely provide a GUI foundation for developing a workflow editor
  • mapf which will enable much more sophisticated multi-agent path finding than what we currently get out of rmf_traffic

Once all these pieces are in place we’ll be well positioned to migrate the fleet adapter implementations over to this new architecture that can accommodate general task workflows. I put together some slides about the vision and motivation for this new architecture here.

Clearly we still have a lot of ground to cover before this capability is usable, but our progress on this has been ramping up rapidly in the last few months, and this transition will soon become our central focus.

Posted by @ItamarEliakim-R:

Any progress on the project? I would love to hear/help if needed.

Posted by @mxgrey:

Thanks for checking! Just now I posted a reply to a similar question in this discussion, so you can check there for more information on the current progress.

We always welcome external contributions. Feel free to contact me or open a discussion if there’s a particular aspect or library you’d be interested in helping with. Our current effort is heavily focused on building up capabilities within the Rust and Bevy ecosystems so that we can benefit from all the advantages those provide.

To comply with the OSRA charter, some time in June we’ll begin holding project management meetings for Open-RMF where we’ll be discussing progress and current technical challenges. These meetings will generally be open to the public to observe, so you can join those meetings as an observer, and that will also provide a chance to discuss opportunities to contribute. When we’ve decided the schedule for those meetings, I’ll be announcing it on the ROS Discourse. I’ll try to remember to leave a follow-up comment here with the relevant information.

Posted by @ItamarEliakim-R:

Thanks @mxgrey! Answered you in the other discussion.
I’ll make sure to follow the ROS Discourse to see where we can help with the design/development.