Could the LifecycleNode API be collapsed into Node itself for ROS 2 L-turtle and make all nodes managed? This would simplify matters by not having to deal with managed vs non-managed nodes together in the same application. Making this a first-class concept in ROS 2 would encourage developers to think through the different states the application could be in and provide them a common pattern to code defensively for them. With some default trivial transition implementations and a default transition manager in the executor itself perhaps to transition from uncofigured to start and then to shutdown, the migration could be manageable.
However, are there reasons why we cannot or should not make all Nodes LifecycleNodes?
I think having an option in NodeOptions (passed to a Node constructor) to enable managed states would be neat to have. Iâm sure there are several complexities involved but itâs something worth discussing here.
I think that all nodes having at least a minimal lifecycle would be a good thing. Appropriate defaults and automatic transitions would allow existing nodes to migrate (relatively) easily. However, there is a risk with making all nodes use one lifecycle state machine: Different applications might prefer different lifecycle state machines, depending on their needs.
There are also several things about the current state machine that I would like to revise, and the ability to revise in the future as well is another reason not to have a complex state machine be required for all nodes.
Perhaps a good way forward would be to have a very simple state machine for the default life cycle, with the ability to override it with more complex custom state machines; we could provide some common ones.
Another argument against is the complexity that it would introduce to even the most simple nodes being produced by beginners to ROS. This is something to be very careful about - good defaults might help, but the inherent complexity of having a lifecycle in every node needs to be managed.
I think there are a couple of ways to handle that:
We could make the transition methods optional, so that they donât have to be implemented and users donât have to think about lifecycle if they donât want its benefits. The obvious drawback is that the system then donât know much about what nodes are âreallyâ lifecycle nodes that it can send transition requests to and which are not âactuallyâ lifecycle nodes. We could store some state in the default transition methods if theyâve been overridden or not, but that starts to get a little messy.
We could make them required, but provide an easy way to have them autotransition to Active so that they donât need to think about orchestrating their system, if they donât want to. I actually worked on a new Launch ROS feature for this recently that makes this possible with an autostart=True functionality of LifecycleNode. That does add a bit more required structure to using ROS 2 which I donât love, even as someone that uses lifecycle nodes for everything.
In either case, it makes it possible for users that want to manage the state machine and/or orchestrate the bring up / current state configuration of their systems to still do so.
Another option is also to just wipe out all docs and examples that use Node and start early to have everything be a LifecycleNode in the learning and education pipelines so that folks naturally value these and use them. Especially now with the launch autostart feature, someone can basically just treat it as an auto-transitioning node if they want to. Rather than building it into the base functionality, we steer folks in this direction through demos, tutorials, docs, and best practices more so than we do today. That might actually be the best option, in my opinion.
I think itâs important that a user can still do ros2 run my_package my_node and it actually starts doing things.
Otherwise the talker/listener examples will become really complicated alreadyâŠ
I believe itâs important to minimize the complexity involved in writing a Nodeânot just for the benefit of beginners, but also for maintaining large codebases. Lifecycle-based behavior is often unnecessary and, in many cases, shouldnât be imposed. I support efforts to unify the APIs, such as the approach proposed by @Yadunund using options. However, instead of enabling automatic transitions, I would prefer that Nodes continue to function exactly as they do today when the lifecycle option is not enabledâwithout exposing any lifecycle-related services.
I would be all for consolidating the class types as a matter of making it easier to maintain for libraries that want to provide options. However, we would need to be very careful about how this was implemented. Part of the reason why my company stopped using lifecycle nodes where possible is that we encountered flakiness with the RMW and the services needed to configure and activate nodes. Not to mention that there arenât very robust ways to manage lifecycle nodes out of the box. Using launch_ros for this has its own flakiness, and we have found best success using the nav2_lifecycle_manager, even though the logic and capability is fairly simple. All this to say, is that there are very valid reasons someone may not want to use lifecycle, and any sort of non-lifecycle mode should not introduce additional dependency on the RMW in my opinion.
Having a built-in automatic transition feature would be nice for users like me that donât really care to use lifecycle, but want to use a 3rd party lifecycle node. Automatic transitions advanced with function calls will be more robust than needing to add in a separate process to call the transition services.
Ah finally someone started a thread on this, I guess itâs as good time as any to write down this idea Iâve had for a while that would finally address shutting down nodes by doing it through lifecycles.
So the general idea is that all nodes become implicit lifecycle nodes, but the default behavior should be the same as non-lifecycle nodes are now, meaning:
Each Node automatically activates on constructor completion similar to what smac suggests
Deactivation and Finalization could be triggered in one swoop simply through ros2 node kill which calls the transition service, and would also call the shutdown hook so we finally have a way to stop and clean up processes properly, and even let us kill nodes on machines other than the current one (plus ros2 kill -a could finally clean up the entire system of lingering background nodes which would be an extremely useful reset)
Most importantly: there should be zero extra configuration verbosity unless custom transitions are actually needed, all of this would have to be implemented implicitly inside Node
If you do need to specify transitions yourself, extend the LifecycleNode instead which would keep the current behavior by overriding Node default functionality
I think this would add a lot of manageability while requiring no changes to 3rd party extensions of neither Node nor LifecycleNode.
A node that is just a simple lidar driver. It has no dependencies on other nodes or any need to be launched in tandem with anything⊠and will just sit there and do nothing unless you add a lifecycle manager to it that doesnât trigger it properly half the time. I ended up rewriting it to remove the lifecycle functionality from it to get reliable startup behavior.
I would argue that very few nodes even need custom transitions on load, but most could use it for shutdown if it was painless enough. Imho the key is not to provide a billion options that people wonât bother to learn and use, but to set up defaults that cover the widest range of use cases and make them mandatory.
As the number of Nodes increases, this could significantly impact discovery traffic and the size of the graph cache, particularly depending on the RMW (ROS Middleware) implementation and the specific use case, such as discovery over WiFi.
Today, with Parameter Services already being active by default for each Node, the declaration of its 6 Services and 1 Publisher per Node introduces a noticeable overhead at launch time, especially in large-scale systems (>100 Nodes) and over wireless networks.
Good point. I think that if we were to merge the interfaces together, it would be nice to have it be completely runtime configurable. It seems like there are 3 possible configurations, and maybe we wouldnât want to offer all of them:
Current LifecycleNode behavior: expose publisher and services, and donât auto transition.
Current Node behavior, but with âauto-transitionâ: donât expose publisher and services, and call any configure and activate methods as if they are part of the constructor, and deactivate and cleanup methods as if they are part of the destructor.
Current LifecycleNode behavior, but with auto transition. You may want the convenience of having the configure and activate transitions on startup and deactivate and cleanup methods on shutdown, but you also want to be able to control the state transitions externally. This could be important because I have seen some LifecycleNodes be written to automatically deactivate themselves on errors, or to fail to configure or activate in the first place depending on conditions.
I think that if you disabled the lifecycle interfaces, then the constructor should probably just fail out if it fails to configure or activate at a minimum. Maybe we would want to consider that if the node transitions itself out of active killing the node, but that might be too implementation dependent.
To expose all 3 of these behaviors, maybe we could expose the following ros arg or special ros parameter (like use_sim_time) with 3 different values.