What are you using for containerization in ROS deployments?

Saw this post recently, debating that Docker is not the right tool for robotics. It sparked a good discussion and got me thinking.

I am curious to know what people are actually using in real world deployments and over-the-air updates. Docker, Podman, Snap, Yocto, something else entirely? And does it change between dev and production?

There is a nice blog on this as well: https://blog.robotair.io/best-way-to-ship-your-ros-app-a53927186c35

But still forming my own view on this, would really like to know what people are using.

1 Like

I don’t really have any alternatives besides use something like Ansible to configure your robot computer(s) without any containers/VMs.

I use Docker a lot, but not in a product setting. For developer workflows it’s totally fine, though I’ve been going more into Pixi for those use cases lately. Less isolation than containers, but it’s again fine for dev.

Mostly, the reason I posted is because this article is one I point to often (of course ignoring the company marketing that’s peppered in): ROS Docker; 6 reasons why they are not a good fit | Ubuntu

1 Like

Let me demystify docker for you. It only takes a minute.

Do this:

đźž‚ unshare -rm
# Congratulations! you've just created your own container!

đźž‚ whoami
root
# Wait what? well, don't get your hopes up. You are only root in
# here, in your container

đźž‚ mkdir -p myroot/usr tmp1/usr/{work,merge}
đźž‚ mount -t overlay overlay -olowerdir=/usr/,upperdir=myroot/usr,workdir=tmp1/usr/work tmp1/usr/merge
# Creates an overlay mount over /usr. This combines everything
# from /usr with everything in myroot/usr where the later trumps
# the former. You can see the result in the merge:

đźž‚ ls tmp1/usr/merge/
bin  games  include  lib  lib32  lib64  libexec  libx32  local  sbin  share  src

# Let's play pretend!
đźž‚ mount --rbind tmp1/usr/merge/ /usr/
đźž‚ cd /usr
đźž‚ touch i_am_not_really_here 
# Creates a file in /usr -- wait, how? We are not actually root
# on the host! Right, this is just pretending. We are actually
# creating the file in myroot/usr

đźž‚ ls /usr/
bin  games  i_am_not_really_here  include  lib  lib32  lib64  libexec  libx32  local  sbin  share  src
# Looks real, feels real! All your programs in this container will
# think it's real!

# let's wake up!
đźž‚ exit
đźž‚ ls /usr/
bin  games  include  lib  lib32  lib64  libexec  libx32  local  sbin  share  src
# ok, we dreamed it -- it's not actually there.

đźž‚ ls myroot/usr/
i_am_not_really_here
# But here it is. Not lost. Ready to be used again.

This is powerful, because you can “install” software in your container “on top of” your host. This means, e.g., that you can share your apt installed ROS distribution in your containers and can keep the container image absolutely tiny. In particular you don’t need to copy an entire ubuntu filetree + ROS distro + whatever else. This is insanely light-weight and no sudo required. It’s what we use for sandboxing robot capabilities in Transitive.

Thanks for letting me enlighten you. Now go forth and create your own containers!

(Hint: the next logical step is to read man unshare!)

4 Likes

Thank you for sharing the article! I will go through it, seems interesting.
Yes, I also use Docker for all my development workflows. Pixi is gaining traction these days.

But it seems, like in this article and the LinkedIn post, Docker may not be the best for production environment.

This is interesting! Essentially the working principle behind Docker / Containerization.
I see this could work well in development, but we may need to implement many more modules over it for publishing updates, and fleet management, I think.

I’m surprised nobody mentioned Apptainer. It is like Docker but without the permissions and networking hell by default. Exactly what some people mentioned in the LinkedIn discussion: robotics needs environment isolation, not total isolation.

You can still drop any privileges you want once you are sure with your setup. Apptainer feels like running Docker with --privileged, but you’re still running as a non-root user.

One thing Apptainer is bad at is OTA updates. I don’t think there’s any support for that. Bit I haven’t dug into this, so maybe it’s I just don’t know about the solutions.

But regardless of which container tech you choose, you still need to configure the underlying OS (networking, udev, kernel parameters, sysfs and such, which no container could help with). We use Ansible for that and it is sufficient, but we only manage a few lab robots, not a fleet of products.

And that is exactly what Transitive does: Transitive Core Concepts — #1: Full-stack Packages | Transitive Robotics

Jumping on this thread because I think I’m the weird one here, I haven’t touched ROS outside of a container since the Noetic days.

That said, my pain was never the container itself. It was everything around it. My loop looked like this: develop locally, test in sim, push to branch, SSH in, pull, rosdep install, colcon build, run. Best case 10 minutes. Cross-compiling for ARM? Pour a coffee.

I started writing Ansible playbooks to automate the transfer side, that grew, got opinionated, and eventually became forge, a tool I use every day now.

The whole system lives in a single declarative YAML — hosts, components, dependencies, middleware config. Profiles are just different files: forge simulation.yaml launch spins up Gazebo locally, forge robot.yaml launch pushes to hardware.

Here’s a real config example from a personal project: simulation.yaml vs robot.yaml.

The pipeline has four steps:

  • prep builds a base image with your shared packages compiled into /ros_ws_common
  • stage generates per-component Dockerfiles via Jinja2 templates, imports VCS
    repos with vcstool, runs rosdep, and layers each workspace on top of the base
    (/vcs_ws for remote repos, /ros_ws for local source). BuildKit cache mounts
    keep apt, pip, and rosdep warm between runs
  • build compiles by mounting your workspace into the staged container, so
    artifacts stay on disk rather than baked into the image
  • launch rsyncs only changed artifacts to each host and brings up the compose
    stack over SSH via python-on-whales

For context, on my setup (M4 Mac building for a Jetson Orin):

  • base image takes about 4 minutes
  • cold staging all components around 3m30s. (But you only do that once)
  • warm cache staging is 1m28s for all components, for just one it’s a few seconds.
  • The everyday build + launch loop is under a minute unless you’re touching dependencies.