Embodied Intelligence Upgraded: Self-Improving Robot Policies via RISE World Model Reasoning

Agilex_Robotics · May 28, 2026, 3:53am

Embodied Intelligence Upgraded: Self-Improving Robot Policies via RISE World Model Reasoning

Vision-Language-Action (VLA) models struggle with contact-rich tasks such as dynamic sorting and flexible packing, where tiny execution deviations often result in failure. Traditional real-world reinforcement learning (RL) faces barriers to scaling: high hardware costs, manual environment resets, and slow serial interactions.

The RISE framework (RSS 2026), developed by CUHK MMLab, HKU OpenDriveLab, Tsinghua University and so on, solves this with an “imagination-based self-evolution” paradigm. By training in simulated “imaginary space” instead of the physical world, it eliminates costly trial-and-error and achieves a 95% success rate on complex manipulation tasks using the AgileX Piper robotic arm.

References & Links

Project Page: https://opendrivelab.com/RISE/

Paper Link: https://arxiv.org/pdf/2602.11075

Hardware Used: AgileX PiPER 6-DoF Robotic Arm

Project HomePage：RISE: Self-Improving Robot Policy with Compositional World Model

Open Source: GitHub - OpenDriveLab/RISE: [RSS 2026] Code for RISE: Self-Improving Robot Policy with Compositional World Model · GitHub

RISE(1) (2) (1) (1) (1)

1. Real-World Robot Learning Still Struggles to Scale

Modern Vision-Language-Action (VLA) models can perform basic robotic manipulation through imitation learning (IL), but they still struggle with contact-rich tasks involving dynamic objects, deformable materials, and bimanual coordination. Even small execution errors can lead to task failure.

While reinforcement learning (RL) offers a path toward autonomous robot learning, real-world training remains limited by:

high hardware costs
low training efficiency
manual environment resetting
safety and reliability risks

Researchers have long relied on simulation and world models to improve scalability, but challenges such as the sim-to-real gap, unstable action generation, and slow robot planning continue to limit real-world deployment in embodied AI and autonomous robotics.

2.Building Self-Improving Robot Policies with Compositional World Models

GIF1

The core idea behind RISE is simple: instead of relying entirely on expensive real-world robot training, robots improve themselves inside an imagined environment powered by compositional world models.

RISE separates robot learning into two key components:

1.Controllable Dynamics Model

A fast and controllable world model predicts future robot interactions and manipulation outcomes.

Built on the Genie Envisioner video diffusion model
Generates multi-view future robot trajectories within seconds
Uses lightweight action encoders to ensure actions remain physically consistent and controllable
Pretrained on large-scale robot datasets such as Agibot World and Galaxea for realistic robot manipulation prediction

GIF2

2.Progress Value Model

A value prediction model continuously evaluates robot behavior during manipulation tasks.

Built on the π0.5 Vision-Language-Action (VLA) framework
Combines progress regression and temporal-difference (TD) learning
Detects subtle manipulation failures such as object slipping or unstable contact
Outputs real-time advantage scores for autonomous robot policy optimization

Together, these components enable scalable self-improving robot learning for embodied AI, contact-rich manipulation, and sim-to-real robotic systems.

3.Closed-Loop Self-Improving Robot Learning in Imagined Environments

By using the AgileX PiPER Robot Arm, RISE enables autonomous robot policy improvement entirely within imagined environments, reducing the need for large-scale real-world trial-and-error training.

The self-improving robot learning pipeline consists of three stages:

Policy Warm-Up

The robot policy is initialized using a small amount of offline robot data, including demonstrations and successful or failed manipulation rollouts, allowing the system to learn basic robot manipulation skills.

Imagined Rollout

The robot policy generates actions, while the compositional world model predicts future robot interactions and manipulation trajectories. At the same time, the value model evaluates action quality and estimates real-time advantage values.

Policy Optimization

High-advantage robot actions are reinforced, while low-quality behaviors are gradually filtered out through iterative policy optimization, enabling continuous self-improving robot learning.

GIF3

RISE performs the entire optimization process in virtual environments without requiring repeated real-world robot interaction. During inference, the world model is no longer involved, meaning the system introduces no additional runtime computation cost for real-world robot deployment.

4.Three Challenging Robot Manipulation Tasks, Significant Performance Gains

RISE was evaluated on three high-difficulty real-world robot manipulation tasks, including dynamic brick sorting, flexible bag packing, and precision box assembly, significantly outperforming existing imitation learning (IL) and reinforcement learning (RL) baselines.

Benchmark Results

Benchmark evaluation on the AgileX PiPER Robot Arm indicates a substantial improvement in manipulation success rate, as shown in the results below.

Dynamic Brick Sorting

Success rate improved from 50% → 85%

GIF4

Flexible Bag Packing

Success rate improved from 40% → 85%

GIF5

Precision Box Assembly

Success rate improved from 60% → 95%

GIF6

Compared with online reinforcement learning methods such as PPO and DSRL, RISE demonstrated significantly more stable robot policy optimization without training collapse.

Compared with offline RL approaches such as RECAP, RISE continuously expands robot training distributions through imagined world-model rollouts, greatly improving generalization and reducing overfitting in contact-rich manipulation tasks.

5.Key Design Choices Behind RISE

Ablation studies show that every core module in RISE plays a critical role in stable robot learning and contact-rich manipulation performance.

Key Findings

Removing dynamics model pretraining reduced dynamic sorting accuracy by 32%
Removing the task-centric batch strategy decreased overall task success by 30%
Removing temporal-difference (TD) learning from the value model weakened failure detection and reduced success rates by 35%
An offline-to-online data ratio of 0.6 achieved the best balance between robot policy stability and autonomous exploration

6.Toward Scalable and Low-Cost Self-Improving Robot Learning

RISE demonstrates that a well-trained compositional world model can directly serve as an online reinforcement learning environment for real-world robot manipulation.

This brings three major advantages for embodied AI and autonomous robotics:

1. Lower Training Cost

RISE shifts expensive real-world robot trial-and-error into scalable computation, making high-performance robot policy learning more accessible for smaller robotics teams.

2. Higher Training Efficiency

Unlike traditional real-world RL, imagined robot interactions can run in parallel, dramatically accelerating robot learning and policy optimization.

3. Safer Robot Learning

By performing large-scale trial-and-error inside virtual environments, RISE reduces physical risks and prevents damage to real robotic systems during training.

At the same time, several challenges still remain:

small sim-to-real inconsistencies in rare manipulation scenarios
manual tuning of offline-to-online data ratios
high computational cost for large-scale world model training

Future research will likely focus on uncertainty-aware world models, physics-constrained robot prediction, and more efficient embodied AI training pipelines, enabling robots to solve increasingly complex real-world manipulation tasks through autonomous imagined learning.

Have Question?

If you encounter any issues with environment installation, parameter configuration, or RL training, feel free to leave your questions for further discussion.

Topic		Replies	Views
How to Build a Robot Arm RL Grasping System in Isaac Lab \| NERO Arm ROS General ros2 , ros , helloworld	2	332	May 22, 2026
[Case Study] Cross-Morphology Policy Learning with UniVLA and PiPER Robotic Arm ROS General ros2 , ros_control , ros , deep-learning , robotics	0	255	July 31, 2025
📢 Announcing CRISP: Closing the Gap Between ROS 2 and Robot Learning ROS General release , ros2 , deep-learning , ros2_control , pixi	0	749	August 18, 2025
Humanoid Robot RL Bootcamp (Spain, June 17–19) — Sim-to-Real Training Training & Education ros2 , deep-learning , robot , ros2-control	1	141	May 25, 2026
Sim2Real with OMY: From Gazebo and IsaacSim Training to Real-World Deployment Projects	0	197	July 18, 2025

Embodied Intelligence Upgraded: Self-Improving Robot Policies via RISE World Model Reasoning

Embodied Intelligence Upgraded: Self-Improving Robot Policies via RISE World Model Reasoning

1. Real-World Robot Learning Still Struggles to Scale

2.Building Self-Improving Robot Policies with Compositional World Models

1.Controllable Dynamics Model

2.Progress Value Model

3.Closed-Loop Self-Improving Robot Learning in Imagined Environments

4.Three Challenging Robot Manipulation Tasks, Significant Performance Gains

5.Key Design Choices Behind RISE

6.Toward Scalable and Low-Cost Self-Improving Robot Learning

Have Question?

Related topics