Open Robotics in the Age of Embodied AI

Open Robotics in the Age of Embodied AI: Why the Stack Has to Be Open All the Way Down

For the last twenty years, “open robotics” has mostly meant ROS — a shared middleware that let researchers stop reinventing message-passing and start sharing perception, planning, and control packages. That was a huge win. But the shape of the field has changed. Today the most interesting robotics work happens at the intersection of vision-language models, reinforcement learning, and physical hardware that costs less than a laptop. And in that new landscape, “open middleware” is no longer enough.

To do credible Embodied AI research, you need the whole stack open — from the PCB to the Python API. This post is about why, and what that actually looks like in practice.

I’ve been building a platform called 3we to push on this idea. It’s an AI-First, fully open robot platform: open Apache-licensed Python SDK, open CERN-OHL-P hardware, open ROS2 stack, with the same Python code running identically in simulation and on real hardware. The point of this post isn’t to pitch 3we — it’s to use it as a worked example of what “open all the way down” means and why it matters.


The Three Layers of Openness

When people say “open source robotics,” they usually mean one of three different things, and conflating them causes real problems.

Layer 1 — Open middleware. ROS, ROS2, micro-ROS. The plumbing that lets nodes talk to each other. This layer has been open and healthy for a long time.

Layer 2 — Open algorithms and models. Nav2, SLAM Toolbox, MoveIt, and increasingly the open VLA models coming out of academic labs. The intelligence layer.

Layer 3 — Open hardware. Schematics, PCB layouts, mechanical CAD, BOMs detailed enough that someone with a soldering iron and $500 can reproduce the platform.

Most platforms are open at Layer 1, partially open at Layer 2, and closed at Layer 3. You can write custom nodes for a TurtleBot, but you can’t fab a new motor controller for it. You can fine-tune a policy on a real robot, but if the IMU goes bad, you’re sending the unit back to the vendor.

That’s a problem for a research field that increasingly demands physical experimentation at scale. If a lab wants ten robots to run a multi-agent RL experiment, the math has to work — both financially and logistically. A $1,200 robot times ten is a grant proposal. A $500 robot times ten is a purchase order.


Why Embodied AI Forces the Issue

Three trends in Embodied AI research make full-stack openness non-optional:

1. Sim2Real is the bottleneck, not the algorithm.

The published results on policy learning are extraordinary, but the gap between “works in Isaac Sim” and “works on a real robot in a real hallway” is still where most projects die. Closing that gap requires you to control both ends — sim environment and physical hardware — and align them at the level of sensor noise, motor dynamics, and timing. You can’t do that if the firmware is a black box. You can’t do that if the IMU calibration routine lives in the vendor’s cloud service.

2. Foundation models want to talk to robots.

A modern VLM-controlled robot is a perception-action loop where a 2-second VLM call sits next to a 50Hz control loop. Making that work means the API the model talks to has to be clean, async-first, and decoupled from the realtime layer. That’s a Python ergonomics problem, not a robotics problem. ROS2 by itself isn’t the answer — you need a layer on top that researchers actually want to use. And that layer needs to be open, so the community can shape it.

3. Reproducibility is in crisis.

“We trained a policy on our robot” is a sentence that’s almost impossible to verify if “our robot” is a custom rig in someone’s lab. Open hardware with a published BOM is the robotics equivalent of releasing your training code. If the next lab can’t rebuild your robot for $500 and rerun your experiment, your paper isn’t reproducible — it’s a demo.


What “AI-First” Actually Means

The phrase gets thrown around. In our case, it means a specific design choice: the primary API surface is a Python class, not a set of ROS2 topics.

from threewe import Robot
async with Robot(backend="mock") as robot:
    image = robot.get_image()
    await robot.move_to(x=2.0, y=1.0)
    result = await robot.execute_instruction("go to the red door")

A researcher writing this code never has to know that under the hood there’s a ROS2 graph publishing /cmd_vel, a Nav2 action server handling NavigateToPose, and an ESP32 running micro-ROS over UART. The ROS2 layer is still there — fully open, fully accessible to anyone who needs it — but it’s not a prerequisite.

Switching from backend="mock" (a zero-dependency 2D kinematic simulator that runs on a laptop with no GPU) to backend="gazebo" (full physics) to backend="isaac_sim" (GPU-accelerated parallel RL) to backend="real" (physical hardware) is a one-string change. Identical API. The same move_to() call resolves to a kinematic update, a Gazebo physics step, an Isaac Sim tensor op, or a real motor command.

That property — Sim2Real with zero code changes — is what unlocks the workflow Embodied AI actually needs: prototype in mock, train in isaac_sim, validate in gazebo, deploy on real, all without rewriting your agent.


The Hardware Side: Open Down to the Copper

The Python API is the part researchers see. The part that makes it credible is everything underneath.

The 3we reference hardware is fully published under CERN-OHL-P v2:

  • KiCad 8 schematic and PCB layout for the main controller board (ESP32-S3 + DRV8833 motor drivers + safety relay + connectors)
  • Mechanical CAD (STEP and DXF) for the chassis
  • Bill of materials with specific part numbers from accessible distributors
  • Assembly guide with photos at each step
  • Production outputs — Gerbers, drill files, pick-and-place — ready to send to a fab

The total reproduction cost is under $500. We’ve kept it intentionally accessible: an ESP32-S3 for motor control and micro-ROS, a Raspberry Pi 5 for the main compute, a Hailo-8L M.2 module (13 TOPS) for AI inference, an LD06 360° LiDAR, a BNO055 9-axis IMU, four N20 motors with Mecanum wheels.

A few design decisions are worth calling out because they’re easy to get wrong:

Hardware emergency stop. The physical E-stop cuts motor power through a hardware relay. Software cannot override this path. ISO 13850-compliant, dual-channel. A common mistake in DIY platforms is implementing E-stop as “the software stops sending velocity commands” — that’s not a safety system, that’s an honor system.

Three-tier watchdog. A 500ms /cmd_vel timeout in the Nav2 layer, a 1-second software watchdog in the ESP32 firmware, and a 1.6-second hardware watchdog (TPS3813) that resets the MCU if the firmware itself hangs. Each tier catches a different failure mode.

Payload bus. A standardized 34-pin connector (we call it PBC-34) so users can hot-plug their own payloads — robotic arms, sensor pods, custom end-effectors — without modifying the base. The payload code runs sandboxed: it can’t directly touch motor control or safety circuits.

These details aren’t glamorous, but they’re the difference between a platform you can do real research on and a platform that catches fire during a demo.


Open Hardware Has a Licensing Story Too

One thing that took me a while to get right: open hardware needs its own license, separate from the code license.

In 3we, the split looks like this:

  • Code — Apache 2.0 (firmware, ROS2 packages, Python SDK, web tools)
  • Hardware — CERN-OHL-P v2 (PCB, mechanical, BOM)
  • Documentation — CC-BY-4.0

CERN-OHL-P (Permissive) is the hardware analog of MIT/Apache: anyone can manufacture, modify, and sell the hardware, with attribution. There’s also CERN-OHL-W (Weakly reciprocal) and CERN-OHL-S (Strongly reciprocal) for projects that want copyleft semantics. Picking the right one is a values decision. Permissive maximizes adoption; reciprocal protects the commons. Neither is wrong.

The deeper point: Apache 2.0 doesn’t actually cover hardware. It’s a software license. Releasing your KiCad files under “Apache” creates legal ambiguity that will bite you the moment a manufacturer wants to commit to a production run. Use a hardware license for hardware.


What “Open” Doesn’t Mean

A few common misconceptions, since this post is going to be read by people who care about open source:

Open ≠ free of all constraints. You can be fully open-source and still have a clear sustainability model. We use a CLA + dual-licensing approach: contributors sign a CLA, the open version stays Apache 2.0 forever, and a commercial license is available for organizations that need different terms (e.g., proprietary derivatives, formal warranty). This is the same pattern Qt, MongoDB (historically), and many others have used. It’s not the only model, but it’s a coherent one.

Open ≠ vendor-free. We use ROS2, NVIDIA Isaac Sim, OpenAI APIs, and HuggingFace Hub. Open source doesn’t mean reinventing every dependency — it means the system you build on top is open, and the user can swap any layer.

Open ≠ unmaintained. “Open source” sometimes carries a connotation of “abandoned hobbyist project.” That’s a perception problem the community needs to push back on. An open platform with a healthy maintainer, a CI pipeline that runs on every PR, and a benchmark leaderboard with real submissions is not a hobby — it’s infrastructure.


The Asks

If you’re reading this and you work on robotics or Embodied AI, here are the things that would actually move the field forward, none of which are 3we-specific:

  1. Publish your hardware BOM. Not just “we used a TurtleBot 4.” The actual sensors, the actual cables, the actual mounting brackets. Every paper that doesn’t do this is a half-published paper.
  2. Pick a hardware license. If your KiCad files are on GitHub under a software license, fix it. CERN-OHL-P is fine. Solderpad is fine. Any of them is better than none.
  3. Make Sim2Real reproducible. Publish the sim config alongside the policy. Document the calibration procedure. The first lab to standardize this will own the citation graph for the next decade.
  4. Treat the API as a research artifact. The shape of the Python class researchers write against is not a footnote. It’s the part that gets used a million times.

What’s Next

3we is early. The SDK works; the hardware reproduces; the simulation backends are stable; the benchmark leaderboard accepts submissions. There’s a lot of unfinished work: VLA model deployment is rough around the edges, the multi-robot story needs more attention, and the documentation is uneven.

If any of this resonates — whether you want to use the platform, contribute to it, or just argue with the framing in this post — the repo is at github.com/telleroutlook/3we-robot-platform. Issues and PRs welcome. Disagreement especially welcome.

Open robotics in 2026 isn’t just about open code. It’s about open hardware, open APIs, open benchmarks, and a shared commitment to making the work reproducible. We’re not there yet as a field. But the path is clear, and there’s a lot of room for more people to walk it.

2 Likes

But have you had your hardware stack open hardware certified? Happy to help with that. :smiley:

1 Like

As much as I’d be a big fan of this, this isn’t possible in practice for more than just toy examples. Your IMU samples at at least a kHz, if not more, the thermal development in the robot body influences its bias, as well as air humidity, and many more processes like this are happening inside. You can’t simulate them all for two reasons: 1) not enough time/money to implement it all and identify the models, 2) not enough compute to run all the simulations. You’d basically need a very beefy computer to run a nice complete simulation of a real robot. So what do people do instead? They select. Are you training a walking policy? Maybe you don’t need your exact IMU, but one with similar Gaussian bias noise is enough. Are you training a proprioception improvement? Then you probably don’t need the cameras and lidars. Are you training a detection clustering algorithm? Then you swap the simulated RGB cameras for segmentation cameras to test the clustering and not the detection algorithm. And so on. So basically every downstream task needs a bit different simulation model/config. And this is where the “single string change” sim2real becomes something impossible. I don’t think hiding this complexity from users is a good way. Everyone should know what he’s doing. That’s it. You can offer ways to make this easier, or configs for common scenarios like E2E RL of control from front camera, but I wouldn’t hide there’s some downstream work to be done in most cases.

Other from that, I agree that more openness on most fronts would be a good thing. Unfortunately, open in hardware/firmware usually means not state-of-the-art.

1 Like

For the TurtleBot 3, yes you can.

Switching from backend="mock" (a zero-dependency 2D kinematic simulator that runs on a laptop with no GPU) to backend="gazebo" (full physics) to backend="isaac_sim" (GPU-accelerated parallel RL) to backend="real" (physical hardware) is a one-string change. Identical API.

How is this different from changing whether your system bring-up launches Stage, Gazebo, Isaac, or the robot’s hardware interface nodes? That can similarly be a one-string change in a launch file, and your API is still identical, whether it’s ROS topics or a Python class.

That property — Sim2Real with zero code changes — is what unlocks the workflow Embodied AI actually needs: prototype in mock, train in isaac_sim, validate in gazebo, deploy on real, all without rewriting your agent.

How is this not possible now (excepting the accurate concerns raised by @peci1)?

Not yet certified. Our near-term plan is to launch in the China market first, where open-hardware certification is not required. Once we begin preparing for EU and US distribution channels (such as Amazon), we will complete the necessary certification processes in advance.

Thanks for the pushback — you’re right, and my earlier “single string change” framing was oversimplified.

A full physical simulation of IMU bias drift, thermal effects, and related phenomena is not realistically captured through a one-line abstraction. The appropriate level of simulation fidelity is inherently task-dependent: you choose the level of realism based on what you are actually training or evaluating.

What we currently provide is better described as “presets for common scenarios.” The Python API remains consistent across simulation backends, but the underlying sensor models and simulator configurations still need to be selected and tuned for each use case.

I’ll revisit how we describe this in the documentation, because implying there is no downstream configuration work ultimately does users a disservice.

On the question of open versus state-of-the-art systems — that is a trade-off we consciously accept. Our priority is reproducibility and lowering the barrier to entry for researchers and students, rather than competing with frontier platforms purely on raw capability.

Thanks again for the thoughtful feedback — it genuinely helps improve the project.

1 Like

Fair points — and honestly, you’re right that ROS2 already gives you most of this.
A well-structured launch file plus consistent topic interfaces achieves the same
backend-swap workflow. I shouldn’t have framed it as something new.

What we’re trying to add is mostly packaging: a Python class API that wraps the
ROS2 layer for users who don’t want to write nodes or launch files (students, ML
researchers coming from a Gym/PyTorch background), plus a zero-dependency mock
backend that runs without ROS at all for quick prototyping on a laptop. For users
already comfortable in ROS2, our SDK doesn’t unlock anything you can’t already do —
it’s just a different entry point.

And as @peci1 rightly pointed out, the “zero code changes” claim oversells it in
practice — sensor models and sim configs still need task-specific tuning. I’ll fix
that framing.

1 Like