All Nodes as Lifecycle Nodes

HemalShahNV · May 5, 2025, 5:01pm

Could the LifecycleNode API be collapsed into Node itself for ROS 2 L-turtle and make all nodes managed? This would simplify matters by not having to deal with managed vs non-managed nodes together in the same application. Making this a first-class concept in ROS 2 would encourage developers to think through the different states the application could be in and provide them a common pattern to code defensively for them. With some default trivial transition implementations and a default transition manager in the executor itself perhaps to transition from uncofigured to start and then to shutdown, the migration could be manageable.

However, are there reasons why we cannot or should not make all Nodes LifecycleNodes?

Yadunund · May 6, 2025, 3:59am

I think having an option in NodeOptions (passed to a Node constructor) to enable managed states would be neat to have. I’m sure there are several complexities involved but it’s something worth discussing here.

gbiggs · May 6, 2025, 11:51pm

I think that all nodes having at least a minimal lifecycle would be a good thing. Appropriate defaults and automatic transitions would allow existing nodes to migrate (relatively) easily. However, there is a risk with making all nodes use one lifecycle state machine: Different applications might prefer different lifecycle state machines, depending on their needs.

There are also several things about the current state machine that I would like to revise, and the ability to revise in the future as well is another reason not to have a complex state machine be required for all nodes.

Perhaps a good way forward would be to have a very simple state machine for the default life cycle, with the ability to override it with more complex custom state machines; we could provide some common ones.

Another argument against is the complexity that it would introduce to even the most simple nodes being produced by beginners to ROS. This is something to be very careful about - good defaults might help, but the inherent complexity of having a lifecycle in every node needs to be managed.

smac · May 7, 2025, 12:02am

I think there are a couple of ways to handle that:

We could make the transition methods optional, so that they don’t have to be implemented and users don’t have to think about lifecycle if they don’t want its benefits. The obvious drawback is that the system then don’t know much about what nodes are ‘really’ lifecycle nodes that it can send transition requests to and which are not ‘actually’ lifecycle nodes. We could store some state in the default transition methods if they’ve been overridden or not, but that starts to get a little messy.
We could make them required, but provide an easy way to have them autotransition to Active so that they don’t need to think about orchestrating their system, if they don’t want to. I actually worked on a new Launch ROS feature for this recently that makes this possible with an autostart=True functionality of LifecycleNode. That does add a bit more required structure to using ROS 2 which I don’t love, even as someone that uses lifecycle nodes for everything.

In either case, it makes it possible for users that want to manage the state machine and/or orchestrate the bring up / current state configuration of their systems to still do so.

Another option is also to just wipe out all docs and examples that use Node and start early to have everything be a LifecycleNode in the learning and education pipelines so that folks naturally value these and use them. Especially now with the launch autostart feature, someone can basically just treat it as an auto-transitioning node if they want to. Rather than building it into the base functionality, we steer folks in this direction through demos, tutorials, docs, and best practices more so than we do today. That might actually be the best option, in my opinion.

Timple · May 12, 2025, 8:15am

I think it’s important that a user can still do ros2 run my_package my_node and it actually starts doing things.
Otherwise the talker/listener examples will become really complicated already…

nirwester · May 13, 2025, 8:59am

I believe it’s important to minimize the complexity involved in writing a Node—not just for the benefit of beginners, but also for maintaining large codebases. Lifecycle-based behavior is often unnecessary and, in many cases, shouldn’t be imposed. I support efforts to unify the APIs, such as the approach proposed by @Yadunund using options. However, instead of enabling automatic transitions, I would prefer that Nodes continue to function exactly as they do today when the lifecycle option is not enabled—without exposing any lifecycle-related services.

aposhian · May 30, 2025, 11:23am

I would be all for consolidating the class types as a matter of making it easier to maintain for libraries that want to provide options. However, we would need to be very careful about how this was implemented. Part of the reason why my company stopped using lifecycle nodes where possible is that we encountered flakiness with the RMW and the services needed to configure and activate nodes. Not to mention that there aren’t very robust ways to manage lifecycle nodes out of the box. Using launch_ros for this has its own flakiness, and we have found best success using the nav2_lifecycle_manager, even though the logic and capability is fairly simple. All this to say, is that there are very valid reasons someone may not want to use lifecycle, and any sort of non-lifecycle mode should not introduce additional dependency on the RMW in my opinion.

aposhian · May 30, 2025, 11:26am

Having a built-in automatic transition feature would be nice for users like me that don’t really care to use lifecycle, but want to use a 3rd party lifecycle node. Automatic transitions advanced with function calls will be more robust than needing to add in a separate process to call the transition services.

MoffKalast · May 30, 2025, 1:15pm

Ah finally someone started a thread on this, I guess it’s as good time as any to write down this idea I’ve had for a while that would finally address shutting down nodes by doing it through lifecycles.

So the general idea is that all nodes become implicit lifecycle nodes, but the default behavior should be the same as non-lifecycle nodes are now, meaning:

Each Node automatically activates on constructor completion similar to what smac suggests
Deactivation and Finalization could be triggered in one swoop simply through ros2 node kill which calls the transition service, and would also call the shutdown hook so we finally have a way to stop and clean up processes properly, and even let us kill nodes on machines other than the current one (plus ros2 kill -a could finally clean up the entire system of lingering background nodes which would be an extremely useful reset)
Most importantly: there should be zero extra configuration verbosity unless custom transitions are actually needed, all of this would have to be implemented implicitly inside Node
If you do need to specify transitions yourself, extend the LifecycleNode instead which would keep the current behavior by overriding Node default functionality

I think this would add a lot of manageability while requiring no changes to 3rd party extensions of neither Node nor LifecycleNode.

One of the more eggregious examples of that not happening is probably: GitHub - Myzhar/ldrobot-lidar-ros2: ROS2 package for LDRobot lidar. Based on ROS2 Lifecycle nodes

A node that is just a simple lidar driver. It has no dependencies on other nodes or any need to be launched in tandem with anything… and will just sit there and do nothing unless you add a lifecycle manager to it that doesn’t trigger it properly half the time. I ended up rewriting it to remove the lifecycle functionality from it to get reliable startup behavior.

I would argue that very few nodes even need custom transitions on load, but most could use it for shutdown if it was painless enough. Imho the key is not to provide a billion options that people won’t bother to learn and use, but to set up defaults that cover the widest range of use cases and make them mandatory.

JEnoch · June 2, 2025, 12:19pm

I would like to kindly suggest considering the impact of each Node being a Lifecycle Node by default.
Currently, a Lifecycle Node declares 1 Publisher and 5 Services, which translates to:

For DDS: 6 DataWriters and 5 DataReaders
For Zenoh: 6 Liveliness Tokens and 5 Queryables

As the number of Nodes increases, this could significantly impact discovery traffic and the size of the graph cache, particularly depending on the RMW (ROS Middleware) implementation and the specific use case, such as discovery over WiFi.

Today, with Parameter Services already being active by default for each Node, the declaration of its 6 Services and 1 Publisher per Node introduces a noticeable overhead at launch time, especially in large-scale systems (>100 Nodes) and over wireless networks.

aposhian · June 2, 2025, 12:55pm

Good point. I think that if we were to merge the interfaces together, it would be nice to have it be completely runtime configurable. It seems like there are 3 possible configurations, and maybe we wouldn’t want to offer all of them:

Current LifecycleNode behavior: expose publisher and services, and don’t auto transition.
Current Node behavior, but with “auto-transition”: don’t expose publisher and services, and call any configure and activate methods as if they are part of the constructor, and deactivate and cleanup methods as if they are part of the destructor.
Current LifecycleNode behavior, but with auto transition. You may want the convenience of having the configure and activate transitions on startup and deactivate and cleanup methods on shutdown, but you also want to be able to control the state transitions externally. This could be important because I have seen some LifecycleNodes be written to automatically deactivate themselves on errors, or to fail to configure or activate in the first place depending on conditions.

I think that if you disabled the lifecycle interfaces, then the constructor should probably just fail out if it fails to configure or activate at a minimum. Maybe we would want to consider that if the node transitions itself out of active killing the node, but that might be too implementation dependent.

To expose all 3 of these behaviors, maybe we could expose the following ros arg or special ros parameter (like use_sim_time) with 3 different values.

lifecycle: "manual"
lifecycle: "none"
lifecycle: "auto"

aposhian · June 13, 2025, 6:37pm

I have captured my proposal as a issue on rclcpp: Unify LifecycleNode and Node · Issue #2874 · ros2/rclcpp · GitHub

shuhao · September 22, 2025, 3:42am

This is an important point as lifecycle behaviour is very restrictive and in many case unnecessary. There are two (likely, I do not have full picture) strong arguments against enforcing lifecycle node behaviour for all ROS nodes:

The lifecycle node state machine is rigid: system designers may choose alternative state machine definitions for robot startup due to needs. There could be more or less states needed depending on the complexity of the system being started. Making life cycle node default would enforce the state machine defined by the ROS library on everyone. You will likely get many requests for users to customize the state machine (already talking about making state transition optional), leading to complex and virtually undebuggable problems down the line (especially for an audience that is not as systems-savvy like early-career roboticists).
The lifecycle communication happens on a peer-to-peer manner via services and topics: As already mentioned in this thread and in the discourse post, a large number of services and topics gets created when a lifecycle node is created, which can cause scalability issues. System designers may choose to use an external, global node condition track system to establish startup order, such as by leveraging centralized/distributed configuration DBs like etcd+local cache (a topic for a talk/presentation if there is interest one day!). This kind of global condition tracking is more effective as it effectively acts as a central coordinator/gateway and eliminate (relatively higher bandwidth) p2p communication and bond maintenance. Global systems can also effectively multicast data (in etcd via key watching) to avoid having things like a central lifecycle manager that would restart nodes based on heartbeat failures, which can allow for more nuanced failure-recovery behaviours on a node-by-node basis in more complex execution graphs. If life cycle node becomes the only option, this would not be possible without a lot of hacks.

Topic		Replies	Views
Tutorial: ROS 2 Lifecycle Nodes + Foxglove Training & Education ros2	1	2504	November 22, 2023
Launch wrapper for rclcpp::Node to make it a LifecycleNode ROS General ros2 , design , lifecycle , rclcpp , launch	2	1919	November 22, 2021
Lifecycle Nodes in Python ROS General	4	3926	December 20, 2022
Lifecycle node improvements ROS General	0	2287	August 3, 2023
💻 Open Class: Understanding ROS2 Managed Nodes in Practice Training & Education ros2	0	585	February 28, 2022

All Nodes as Lifecycle Nodes

Related topics