EpisodeVault: open source tool to debug why your LeRobot model regressed

Rohan-Prabhakar · June 10, 2026, 9:54pm

Been building with LeRobot v3 and kept hitting the same wall: retrain a policy, it gets worse, no clear idea why. DVC tells you files changed. MLflow tells you which run. Nobody tells you which tasks dropped or which episodes degraded between dataset versions.

Built a small open source library to fill that gap.

Four commands

episodevault track ./my_dataset
episodevault commit -m "added kitchen episodes"
episodevault diff v1.0 v2.0
episodevault blame model_v3

What the diff looks like

Ran this against two real LeRobot datasets:

Dataset diff: v1.0 → v2.0
────────────────────────────────────────────────────
Episodes added:    +0
Episodes removed:  -7

Distribution shift:
  factory_pick        2 → 6   ↑ 200%  ⚠️
  kitchen_grasp       4 → 1   ↓ 75%   ⚠️

Quality metrics:
  avg episode length:    3.7s → 3.0s  ↓
  success_rate:          0.88 → 0.38  ↓
  camera_sync_score:     1.00 → 1.00  →

Likely regression cause:
  'kitchen_grasp' episodes dropped 75% (4 → 1). Restore from prior
  version if this task is in your eval benchmark.

The blame command

One line in your training script:

import episodevault as ev
ev.log_training_run(model_version="model_v3", dataset_version="v2.0")

Then later:

episodevault blame model_v3

Traces the model back to the exact dataset version that trained it and shows the diff automatically.

Compatibility

Tested against four real datasets from the Hub:

Robot	Dataset	Episodes	Frames	Parse time
aloha	aloha_static_pro_pencil	25	8,750	0.35s
aloha	aloha_mobile_shrimp	18	67,500	0.38s
so100	svla_so100_stacking	56	22,956	0.63s
aloha	aloha_mobile_cabinet	85	127,500	2.73s

Install

pip install episodevault

Python 3.10+. Works on any local LeRobot v3 dataset.

GitHub: Rohan-Prabhakar/EpisodeVault

If you have a dataset where this breaks or gives a wrong regression hint, open an issue, that’s the most useful feedback right now.

Rohan-Prabhakar · June 18, 2026, 5:46am

Update: just shipped cloud-native storage support (S3, GCS, Azure).

EpisodeVault now reads metadata directly from cloud object storage via fsspec, so you don’t have to download terabytes of video just to diff your dataset. You can run:

episodevault track s3://my-bucket/lerobot-v1
episodevault commit s3://my-bucket/lerobot-v1 -m “initial snapshot”
episodevault diff v1.0 v2.0 s3://my-bucket/lerobot-v1 --html audit.html

It only pulls the tiny Parquet manifests over the network. Raw sensor data stays in the bucket. Works with AWS S3, Google Cloud Storage, and Azure Blob. Also added a full test suite covering the cloud code paths using fsspec’s memory filesystem.

Topic		Replies	Views
Rocky's Open-Source Build Thread (AI for Industry Challenge) AI for Industry Challenge	35	2762	April 24, 2026
My first Results: PI0.5 VLA Policy AI for Industry Challenge	8	1732	May 21, 2026
Cobot Magic: AgileX achieved the whole process of Mobile Aloha model training in both the simulation and real environment ROS General	0	1055	March 15, 2024
Announcing rosetta: a ROS 2 ⇄ LeRobot bridge ROS General	0	1343	October 22, 2025
Requesting use-case ideas: dataset DataOps for rosbag→images→COCO/YOLO training Projects simulation , tools , data , perception	0	62	January 26, 2026