Embodied Intelligence Upgraded: Self-Improving Robot Policies via RISE World Model Reasoning
Vision-Language-Action (VLA) models struggle with contact-rich tasks such as dynamic sorting and flexible packing, where tiny execution deviations often result in failure. Traditional real-world reinforcement learning (RL) faces barriers to scaling: high hardware costs, manual environment resets, and slow serial interactions.
The RISE framework (RSS 2026), developed by CUHK MMLab, HKU OpenDriveLab, Tsinghua University and so on, solves this with an “imagination-based self-evolution” paradigm. By training in simulated “imaginary space” instead of the physical world, it eliminates costly trial-and-error and achieves a 95% success rate on complex manipulation tasks using the AgileX Piper robotic arm.
References & Links
- Project Page: https://opendrivelab.com/RISE/
- Paper Link: https://arxiv.org/pdf/2602.11075
- Hardware Used: AgileX PiPER 6-DoF Robotic Arm
- Project HomePage:RISE: Self-Improving Robot Policy with Compositional World Model
- Open Source: GitHub - OpenDriveLab/RISE: [RSS 2026] Code for RISE: Self-Improving Robot Policy with Compositional World Model · GitHub

1. Real-World Robot Learning Still Struggles to Scale
Modern Vision-Language-Action (VLA) models can perform basic robotic manipulation through imitation learning (IL), but they still struggle with contact-rich tasks involving dynamic objects, deformable materials, and bimanual coordination. Even small execution errors can lead to task failure.
While reinforcement learning (RL) offers a path toward autonomous robot learning, real-world training remains limited by:
-
high hardware costs
-
low training efficiency
-
manual environment resetting
-
safety and reliability risks
Researchers have long relied on simulation and world models to improve scalability, but challenges such as the sim-to-real gap, unstable action generation, and slow robot planning continue to limit real-world deployment in embodied AI and autonomous robotics.
2.Building Self-Improving Robot Policies with Compositional World Models

The core idea behind RISE is simple: instead of relying entirely on expensive real-world robot training, robots improve themselves inside an imagined environment powered by compositional world models.
RISE separates robot learning into two key components:
1.Controllable Dynamics Model
A fast and controllable world model predicts future robot interactions and manipulation outcomes.
-
Built on the Genie Envisioner video diffusion model
-
Generates multi-view future robot trajectories within seconds
-
Uses lightweight action encoders to ensure actions remain physically consistent and controllable
-
Pretrained on large-scale robot datasets such as Agibot World and Galaxea for realistic robot manipulation prediction

2.Progress Value Model
A value prediction model continuously evaluates robot behavior during manipulation tasks.
-
Built on the π0.5 Vision-Language-Action (VLA) framework
-
Combines progress regression and temporal-difference (TD) learning
-
Detects subtle manipulation failures such as object slipping or unstable contact
-
Outputs real-time advantage scores for autonomous robot policy optimization
Together, these components enable scalable self-improving robot learning for embodied AI, contact-rich manipulation, and sim-to-real robotic systems.
3.Closed-Loop Self-Improving Robot Learning in Imagined Environments
By using the AgileX PiPER Robot Arm, RISE enables autonomous robot policy improvement entirely within imagined environments, reducing the need for large-scale real-world trial-and-error training.
The self-improving robot learning pipeline consists of three stages:
- Policy Warm-Up
The robot policy is initialized using a small amount of offline robot data, including demonstrations and successful or failed manipulation rollouts, allowing the system to learn basic robot manipulation skills.
- Imagined Rollout
The robot policy generates actions, while the compositional world model predicts future robot interactions and manipulation trajectories. At the same time, the value model evaluates action quality and estimates real-time advantage values.
- Policy Optimization
High-advantage robot actions are reinforced, while low-quality behaviors are gradually filtered out through iterative policy optimization, enabling continuous self-improving robot learning.

RISE performs the entire optimization process in virtual environments without requiring repeated real-world robot interaction. During inference, the world model is no longer involved, meaning the system introduces no additional runtime computation cost for real-world robot deployment.
4.Three Challenging Robot Manipulation Tasks, Significant Performance Gains
RISE was evaluated on three high-difficulty real-world robot manipulation tasks, including dynamic brick sorting, flexible bag packing, and precision box assembly, significantly outperforming existing imitation learning (IL) and reinforcement learning (RL) baselines.
Benchmark Results
Benchmark evaluation on the AgileX PiPER Robot Arm indicates a substantial improvement in manipulation success rate, as shown in the results below.
- Dynamic Brick Sorting
Success rate improved from 50% → 85%

- Flexible Bag Packing
Success rate improved from 40% → 85%

- Precision Box Assembly
Success rate improved from 60% → 95%

Compared with online reinforcement learning methods such as PPO and DSRL, RISE demonstrated significantly more stable robot policy optimization without training collapse.
Compared with offline RL approaches such as RECAP, RISE continuously expands robot training distributions through imagined world-model rollouts, greatly improving generalization and reducing overfitting in contact-rich manipulation tasks.
5.Key Design Choices Behind RISE
Ablation studies show that every core module in RISE plays a critical role in stable robot learning and contact-rich manipulation performance.
Key Findings
-
Removing dynamics model pretraining reduced dynamic sorting accuracy by 32%
-
Removing the task-centric batch strategy decreased overall task success by 30%
-
Removing temporal-difference (TD) learning from the value model weakened failure detection and reduced success rates by 35%
-
An offline-to-online data ratio of 0.6 achieved the best balance between robot policy stability and autonomous exploration
6.Toward Scalable and Low-Cost Self-Improving Robot Learning
RISE demonstrates that a well-trained compositional world model can directly serve as an online reinforcement learning environment for real-world robot manipulation.
This brings three major advantages for embodied AI and autonomous robotics:
1. Lower Training Cost
RISE shifts expensive real-world robot trial-and-error into scalable computation, making high-performance robot policy learning more accessible for smaller robotics teams.
2. Higher Training Efficiency
Unlike traditional real-world RL, imagined robot interactions can run in parallel, dramatically accelerating robot learning and policy optimization.
3. Safer Robot Learning
By performing large-scale trial-and-error inside virtual environments, RISE reduces physical risks and prevents damage to real robotic systems during training.
At the same time, several challenges still remain:
-
small sim-to-real inconsistencies in rare manipulation scenarios
-
manual tuning of offline-to-online data ratios
-
high computational cost for large-scale world model training
Future research will likely focus on uncertainty-aware world models, physics-constrained robot prediction, and more efficient embodied AI training pipelines, enabling robots to solve increasingly complex real-world manipulation tasks through autonomous imagined learning.
Have Question?
If you encounter any issues with environment installation, parameter configuration, or RL training, feel free to leave your questions for further discussion.


