A Standard Interface from LLM to Robot Actions

Hi ROS community,

I’m working on ClawsJoy Robotics, a proposal for a standard interface between LLM-based cognitive systems and ROS/2 robots. The core idea: any LLM outputs an action, any robot implements these standard topics.

The Problem

Every robotics company maps “pick up the cup” to different ROS/2 interfaces:

  • Company A: custom service /manipulation/pick

  • Company B: action server /grasp_object

  • Company C: proprietary SDK

There’s no standard way for an LLM agent to tell a robot what to do.

The Proposal: 12 Standard Motion Primitives

navigate_to → /base/cmd_vel (geometry_msgs/Twist)
move_arm → /arm/joint_trajectory (trajectory_msgs/JointTrajectory)
move_base → /base/cmd_vel (geometry_msgs/Twist)
grasp → /gripper/command (std_msgs/Float32, 0=close 1=open)
release → /gripper/command (std_msgs/Float32)
push → /base/cmd_vel (geometry_msgs/Twist)
look_at → camera PTZ control
detect_objects → /perception/objects (vision_msgs/Detection3DArray)
get_pose → /base/odom + /arm/joint_states
speak → /speech/tts (std_msgs/String)
play_gesture → /arm/joint_trajectory (pre-defined trajectories)
wait → no topic (sleep)
emergency_stop → /safety/estop (std_msgs/Bool)
Demo Output

Natural language → task execution (mock mode):
[Result]: :white_check_mark: Task completed in 6 steps
▸ look_at → table
▸ detect_objects → cup detected
▸ navigate_to → table
▸ move_arm → grasp position
▸ grasp → cup_01
▸ speak → “已拿到水杯” (cup fetched)

[Result]: :white_check_mark: Task completed in 6 steps
▸ navigate_to → waypoint_1
▸ look_at → front
▸ speak → “waypoint_1 clear”
▸ navigate_to → waypoint_2
▸ look_at → left
▸ speak → “waypoint_2 clear”
Integration Checklist

To integrate a robot with ClawsJoy Robotics:

  1. Arm: publish /arm/joint_states, subscribe /arm/joint_trajectory

  2. Gripper: subscribe /gripper/command (Float32, 0=close 1=open)

  3. Base: subscribe /base/cmd_vel, publish /base/odom

  4. Camera: publish /camera/rgb, /camera/depth

  5. Perception: publish /perception/objects

  6. Speech: subscribe /speech/tts

  7. Safety: subscribe /safety/estop

Then set mock_mode=False and run.

Code

https://github.com/your-org/clawsjoy_robotics

12 primitives defined in motion_primitives.yaml. Full topic mapping in bridge/ros2_bridge.py. Task planner in planner/planner.py.

Questions for the community

  1. Are these 12 primitives sufficient, or should we add/remove some?

  2. Is trajectory_msgs/JointTrajectory the right interface for move_arm, or should we use ROS/2 Actions?

  3. Should we adopt the existing ros2_control interfaces instead of defining new topics?

  4. Is this worth standardizing, or is every robot too different?

Looking forward to feedback.