LSEP: Open protocol for standardized robot-to-human state communication (light + sound + motion)

Hello ROS community,

I’d like to introduce LSEP (Light Signal Expression Protocol) — an open standard I’ve been developing for how robots communicate their internal state to nearby humans using coordinated light signals, sound, and motion cues.

The problem LSEP solves:

Every robot manufacturer currently invents their own LED patterns and sound cues. There’s no shared vocabulary. A blinking blue light could mean “charging” on one platform and “human detected” on another. With the EU AI Act (Art. 50) now requiring transparency for human-facing AI systems, the industry needs a standardized approach.

What LSEP defines:

- 6 core states: IDLE, AWARENESS, INTENT, CARE, CRITICAL, THREAT

- 3 extended states: MED_CONF, LOW_CONF, INTEGRITY (for sensor uncertainty and self-diagnostics)

- Each state maps to specific light color + pulse pattern, optional sound, and motion modifier

- State transitions driven by Time-to-Contact (TTC) physics, not heuristics

- 1.5m proximity floor: any human within 1.5m triggers minimum AWARENESS

Technical details:

- RFC style specification (v2.0)

- Machine readable JSON signal definitions

- Unity prototype (HDRP) with 74 tests, including sensor noise simulation and tracking dropouts

- MIT licensed — use it however you want

Why I’m posting here:

ROS is where robot software gets built. If LSEP is going to be useful, it needs to work in your stacks — as a ROS node, a topic publisher, or a behavior tree integration. I’m looking for:

1. Feedback on the state model — Do 9 states cover the scenarios you encounter? What’s missing?

2. Integration ideas — How would you want to consume LSEP in a ROS 2 pipeline? As a `/lsep_state` topic? A lifecycle node?

3. Real-world edge cases — What breaks first when you imagine deploying this on your robot?

Links:

- Specification + demo: [lsep.org](https://lsep.org)

- GitHub: [ GitHub - NemanjaGalic/LSEP: Open protocol for standardized human-robot communication — 9 states, 3 modalities, 1 grammar. Physics-based. EU AI Act ready. · GitHub ]( GitHub - NemanjaGalic/LSEP: Open protocol for standardized human-robot communication — 9 states, 3 modalities, 1 grammar. Physics-based. EU AI Act ready. · GitHub )

Happy to answer questions and discuss. The goal is to make this the “USB-C of robot communication” — one standard, every platform.

3 Likes

Very interesting project. At Savioke we experienced every day how important HRI and especially sound cues are for building trust with human interactors. And the thing to keep in mind with these humans is that many of them are involuntary and untrained interactors, e.g., by-standers watching a delivery robot, delivery recipients, untrained new staff. Careful design of HRI can make the difference between “hating” and “loving” the robot.

That said, I would like to question:

Just today, Paul Graham published a great blog post about the need for branding, and I’d argue that communication style, UX, and HRI in general is perhaps the biggest opportunity for such differentiation via branding.

Imagine for a second that every cartoon character looked the same, communicated the same, and in general exhibited the same “personality” – that would be super boring and no one would want to read/watch new cartoons anymore! Of course robots aren’t made to entertain, but the for differentiation, here from a business perspective, is still shared with cartoon characters. Another analogy would be car designs.

So is standardization really a good goal here? Do you really think you can convince robot manufacturers to give up on the idea of themselves designing the “personality” of their robots and follow this standard instead? You might argue that your standard is a bit like standardizing the color of blinkers and break-lights on cars, and I think that that degree you would be right, but only in an industrial setting. Robots designed to operate around humans will probably orient their design more along the lines of what humans do, e.g., “look in the direction you are going” – some robots already do that with animated eyes shown on a screen in the front.

1 Like

Hi chfritz,

Thank you for this thoughtful reply — your perspective from Savioke is exactly the kind of real-world experience this conversation needs. You’re absolutely right that careful HRI design can make the difference between people trusting or rejecting a robot, and that many human interactors are involuntary and untrained. That’s actually one of the core motivations behind LSEP.

You raise a fair point about the “USB-C” analogy and whether standardization conflicts with a robot’s personality. I’d like to offer a different framing.

LSEP is not the personality layer — it’s the safety layer.

Think of it like cars: every car has standardized turn signals (amber), brake lights (red), and headlights (white). These are universal, and no manufacturer would argue that using the same color for brake lights limits their brand. Yet cars still have wildly different personalities — a Porsche and a Toyota feel nothing alike. The standardized signals handle the safety-critical minimum: “I’m turning,” “I’m stopping,” “I’m here.” Everything above that is brand territory.

LSEP works the same way. It defines the minimum visual language for states like “I see you” (AWARENESS), “I’m coming toward you” (INTENT), or “Danger — move away” (CRITICAL). These are the turn signals. A delivery robot at a hotel, a warehouse AGV, and a surgical assistant all need to communicate these states — and the untrained bystander you mentioned shouldn’t have to learn a different color code for each manufacturer.

What happens above LSEP — the eyes, the motion style, the sound personality, the “character” — that’s completely open. LSEP doesn’t touch that. In fact, I’d argue it enables more creative personality design, because the safety baseline is already covered.

Regarding Paul Graham’s “The Brand Age” essay — I actually think it supports LSEP rather than contradicting it. Graham’s thesis is that brand takes over when functional differences disappear. But in robot-to-human communication, we haven’t even solved the functional problem yet. There is no shared language. We’re still in the “golden age” phase where getting the fundamentals right matters more than differentiation. Brand comes later — and it will come on top of the standard, not instead of it.

You mentioned that consumer robots will probably orient toward human behaviors like animated eyes and gaze direction. I agree — and that’s a great example of the personality layer. But those human-like cues don’t help when a 200kg logistics robot approaches you in a warehouse at 2m/s. Different contexts need different solutions, and LSEP is designed for the contexts where misunderstanding carries real risk.

Would love to hear your thoughts on this framing. And if you’re curious, the full spec and a Unity prototype are on lsep.org and GitHub.

Best,
Nemanja

2 Likes

Was this answer AI generated? It seems to be.

Yes, I know, because it was me who mentioned that first:

I can’t imagine a person writing a whole new paragraph like that to convince me of a point that I raised myself. That doesn’t make any sense.

Personally I don’t care to read AI generated answers where it’s not clear which part of the text was generated or hallucinated by AI and what was actually based on human input. I care about your opinion.

That’s not the function of the robot!

1 Like

@chfritz Wow, lesson learned! :laughing: Please accept my apologies, and thank you so much for the feedback.

Having now had the chance to read through your text properly, I clearly see the need for a more serious, in-depth exchange. I’d love to dive into your points:

  1. Regarding sound cues: You mentioned that sound cues are vital for building trust with human interactors. Based on your experience, which specific cues have truly established themselves as an effective bridge between humans and machines? I also noticed that Savioke provides solutions for both hotels and hospitals. In the latter, stress levels are naturally high, which heavily impacts human communication. How does this translate to robot-human interaction? What specific observations did you make there, and what solutions did you implement as a result?

  2. The cartoon analogy: I absolutely love that! You’ve got a very valid point there. Every robot manufacturer needs their own “recognition factor” through distinct visuals and personality—we certainly don’t want to change that. To take your analogy a step further: Did you ever watch Dragon Ball? If so, you’ll know Goku. He’s a Saiyan, and his evolution leads him to become a Super Saiyan and beyond.

    To bring it back to LSEP: Think of us as a “Security Layer.” With our protocol, we want to contribute to a peaceful coexistence between humans and robots. Currently, the focus—especially with humanoids is heavily on the physics of solving complex real-world problems. However, we see a massive opportunity to improve interaction by making robots inherently safer for humans, pets, and especially vulnerable individuals. Looking ahead, we also want to contribute to finding a solution for the issue of liability.

What are your thoughts on this? I’d love to hear your perspective!

By the way, do you happen to have a retired robot at Savioke that’s just sitting around and taking up space? We’re searching a Roboter where we can deploy our security protocol to see how it performs within a controlled environment.

All the best (herzensgute Wünsche), Nemanja

I have recently also searched for a standard way to express robot intent for wheeled mobile robots, and have come up empty handed. I’m glad there’s discussion going on the subject.

Here are some aspects which I think are not fully covered:

  • Physical robot “might”: is it useful to prescribe speed and power consumption without taking into consideration the kinematics and size of the robot? I wouldn’t mind a roomba bumping into me at even 1m/s, but would absolutely fear a heavyweight industrial arm moving even 0.1m/s with me in the room. I admit, that’s hyperbola, but I hope it gets the point across.

  • State orthogonality, or lack thereof: Why are the low confidence / high confidence states mutually exclusive with the system self-check? I find it unclear whether the sensor confidence states are also mutually exclusive with the main operating states. I sure hope they aren’t, but I can’t interpret the part about signaling them (e.g. what should your lights be doing when both INTENT and MED_CONF are applicable?)

  • State choice: I’m unsure I understand why there is a common IDLE state for both “robot is doing nothing” and “there are no humans around”.

  • Photosensitive epilepsy & accessibility: Formal resources on this are somewhat scarce, but one of the last things you want to do after bumping into someone suffering of photosensitive epilepsy is to start screaming your speakers out and flashing red light at 3 Hz. (If my memory serves me right, try to stay under 2 Hz, but even that might be risky) That’s surely a problem for cross-domain collaboration with accessibility experts, but something to keep in mind for such an ambitious standard.

  • Heterogenous perception and sensor fusion: Take the example in which a robot like the G1 has a high resolution forward facing camera and lower resolution lidar all around. How should one handle the problem of reporting sensor confidence in that scenario? Do we panic when we see a possibly-human blob behind the robot? Do we always act as if all our perception is in tip-top condition because the high res camera up front is doing well?

  • Expressing more complex intentions: With mobile robots, it’s somewhat useful to express the difference between intending to move forward, turn, dock, etc, and you see manufacturers develop different light patterns for each (a quick search turns up this for OTTO AMRs)

Some nitpicks on presentation:

  • The site seems to be broken, at least on Firefox. Links don’t work. Might be a case of Too Much Javascript.
  • – totally unnecessary :]

@trupples — Thank you for taking the time to write this incredibly thorough analysis. This is exactly the kind of detailed, hard-hitting technical feedback that makes open standards actually usable in the real world.

I’ll address your points directly against our current v2.0 spec:

1. Physical Robot “Might” (kinematics & size)

You nailed it. The current spec uses TTC (Time-to-Collision = distance / closing_velocity) as the primary determinant. While TTC implicitly accounts for velocity, you’re totally right that mass and kinematic capability are currently ignored. A 500kg industrial arm at 0.1 m/s carries a massive amount of kinetic energy (E_k = ½mv²) compared to a 3kg Roomba at 1 m/s, even though the Roomba might have a shorter TTC.

This is a known blind spot we are tracking for v2.1. Our current thinking is to introduce an optional platform_risk_class (essentially a Kinetic Energy Multiplier based on ISO 13482 categories) that offsets the TTC thresholds. A heavyweight arm would simply trigger CARE at a much longer TTC than a lightweight mobile platform. The golden rule here: any modifier must remain strictly physics-based, never demographic-based.

2. State Orthogonality (Confidence × Operating States)

This is probably the most critical implementation gap you’ve pointed out. Extended States (MED_CONF, LOW_CONF, INTEGRITY) are designed as overlays, not replacements. But what does that actually look like?

If we just try to blend colors (e.g., mixing an Amber Core State with a Cyan Extended State on a physical LED ring), we get washed-out “visual mud” on standard hardware. That helps no one. For v2.1, we will explicitly define Spatial Multiplexing rules. For example: the front 80% of a light array displays the Core State (e.g., INTENT amber flow), while the rear 20% pulses the Extended State (e.g., MED_CONF cyan shimmer). For the audio channel, the MED_CONF query tone would play in the gaps between the INTENT hum cycles. I’m opening a GitHub issue for a dedicated “State Composition” section to nail this down.

3. IDLE State Semantics (“doing nothing” vs. “no humans around”)

Good observation. In v2.0, IDLE triggers when no human is within 50 meters. From the human’s perspective, there’s no practical difference — if you’re not there, you don’t see the signal. The breathing pulse exists primarily as a system health heartbeat for operators and to provide a smooth animation anchor when a human does enter the zone (IDLE → AWARENESS).

For AMR fleets in warehouses, I can see the argument for splitting this into STANDBY and UNOCCUPIED. We initially decided against it to avoid state-bloat without a direct safety benefit, but I’m definitely open to revisiting this if the AMR community needs it.

4. Photosensitive Epilepsy & Accessibility

Incredibly important catch. Just to quickly clarify the v2.0 spec: THREAT is designed as a de-escalating “White Breathing” glow (the design intent is to be non-aggressive when the robot is physically threatened). However, you are absolutely right about the CRITICAL state. It currently specifies a 3.0 Hz red strobe, which sits right at the danger threshold for photosensitive epilepsy according to the Harding test.

This is a genuine conflict between safety urgency and accessibility. For v2.1, we will cap the CRITICAL strobe at a strict 2.0 Hz maximum. We’re also exploring asymmetric pulse patterns (like short-short-pause) that convey high urgency without triggering standard epilepsy thresholds. We will add a dedicated accessibility compliance section referencing ISO 8596. Thank you for flagging this!

5. Heterogeneous Sensor Fusion

The spec handles this partially with general confidence thresholds, but your G1 example (high-res front, low-res rear) highlights a missing piece: directional confidence.

Currently, the conservative default would be to report the lowest overall confidence across all sectors (triggering MED_CONF globally). But you’re right, that’s inefficient. This ties perfectly into the Spatial Multiplexing concept from Point 2: v2.1 needs to support per-sector confidence reporting, so a robot can show normal Core States on its front LEDs while the rear LEDs indicate degraded sensing.

6. Expressing Complex Intentions (AMR-specific)

Great reference to the OTTO AMR light patterns. This brings up an important architectural boundary for the protocol: LSEP is strictly designed as a safety- and state-awareness layer, not a navigation telemetry substitute.

Turning, docking sequences, and path-following are operational telemetry, which are functionally orthogonal to a robot’s safety state. A robot can absolutely display an LSEP AWARENESS state on its main torso while simultaneously using standard directional indicators on its chassis to signal a turn. We want to keep LSEP lean, modular, and focused purely on human-robot safety transparency, rather than trying to swallow every possible operational behavior into a single standard.

Regarding the lsep.org Firefox bug — confirmed, the Framer JS is acting up. GitHub remains our canonical source of truth anyway. And noted regarding the analogy in the previous comment — sticking strictly to the spec moving forward!

Next steps: I’m creating GitHub issues for Points 1, 2, 4, and 5 right now to track them for the v2.1 release. Your feedback directly shapes this standard. We are currently forming the LSEP Alliance (a consortium of engineers working on HRI standardisation) — if you’re interested, we would love to have your voice in the mix.

Full spec is here: github.com/NemanjaGalic/LSEP