RosBag Resurrector — pandas-like analysis for MCAP / ROS 2 bag files

Hi all,

I’m releasing the first public version of RosBag Resurrector today — an open-source (MIT) Python library + web dashboard for analyzing MCAP and ROS 2 bag files. No ROS installation required.

The core idea: treat a bag like a pandas DataFrame. Open it, get column-style access to any topic, do whatever filter / transform / export you’d normally do with tabular data. The intent is to fill the gap between “raw bag file on disk” and “ML training dataset” without forcing you to write throwaway scripts every time.

python

from resurrector import BagFrame

bf = BagFrame(“experiment.mcap”)

bf.info()                                            # rich summary

df = bf[“/joint_states”].to_polars()                 # any topic → DataFrame

synced = bf.sync([“/imu/data”, “/joint_states”],  method="nearest", tolerance_ms=50)

bf.health_report()                                   # quality score

bf.export(topics=[…], format=“lerobot”,  output="training_data/")


Full feature list:

- `BagFrame` API — pandas/Polars-like access to any topic. Lazy by default; chunked iteration for large topics; `materialize_ipc_cache()` for filter/projection pushdown via Polars LazyFrame.

- Health validation — automatic 0–100 score per bag, detecting dropped messages, time gaps, out-of-order timestamps, message size anomalies. Per-platform threshold configuration.

- Multi-stream sync — nearest / interpolate / sample-and-hold methods with explicit tolerance, anchor-topic, out-of-order, and boundary policies. Streaming engine when bags are large; eager when they fit in memory; `engine=“auto”` picks for you.

- ML-ready export — Parquet, HDF5, CSV, NumPy, Zarr, plus **LeRobot** and **RLDS** for direct use in robot-learning training pipelines. Streamed chunk-by-chunk so large topics don’t OOM.

- Semantic frame search — CLIP embeddings indexed into DuckDB. Query video content with plain English (`resurrector search-frames “robot arm collision”`). Available in the dashboard with thumbnail results.

- PlotJuggler-compatible bridge— WebSocket relay from any recorded bag at configurable speed (0.1×–20×) or live ROS 2 topic relay (rclpy-based).

- Web dashboard — Library, Explorer (Plotly with brush-to-zoom, linked cursors, click-to-annotate), Health, Compare, Cross-bag overlay, Search, Datasets, Bridge. Runs at `localhost:8080`.

- Reproducible datasets — versioned dataset collections with SHA256 manifests + auto-generated READMEs.

- Memory bounded by chunk size, not bag size — verified by a regression test on a 10M-message synthetic bag.

- 18 runnable example scripts under `examples/` covering every feature, each <10 seconds against an auto-generated sample bag.

Formats: MCAP is the optimized primary path (ROS 2 default since Iron). Legacy `.bag` and `.db3` auto-convert via the official `mcap` and `ros2 bag convert` CLIs — no parser maintenance from us.

Quick try:

pip install rosbag-resurrector

resurrector doctor

resurrector demo --full

resurrector dashboard

GitHub: https://github.com/vikramnagashoka/rosbag-resurrector

This is a brand-new public release — would genuinely appreciate feedback from the community. The two questions I most want answered:

1. Which post-recording bag workflows are you writing one-off Python scripts for right now? Those are the use cases I want to prioritize next.

2. Are there bag-related pain points you’ve already given up on solving? I’d love to hear the “I wish a tool just did X” wishes.

Bug reports and feature requests welcome via GitHub issues.

Whatever AI generated your Markdown seems to have not done a very good job of it.

Somewhat related, I don’t understand why people keep adding [] quotes to their titles like [new tool] or [release]. Discourse already has tags and categories for this purpose. I have to deal with manually removing them when I write the news and it is a bit annoying.

Thanks for the feedback. I have updated the post to remove the [ ] and added tags instead.

Definitely interesting tool. Can you point it at a directory and treat multiple bags as one dataset or is it one bag at a time? Also curious how it handles custom message types or if it’s limited to the standard ones.

Great questions, both very practical. Honest answers below.

-> Multiple bags as one dataset

Yes, point resurrector scan at a directory and it walks recursively, picks up every .mcap plus any ROS 2 directory bags (metadata.yaml + storage shards), and indexes them into a local DuckDB:

import resurrector
resurrector.scan(“/data/bags”)

Filter DSL: topic / health / tag / before / after

hits = resurrector.search(“topic:/imu/data health:>=80 after:2026-04-01”)
for bag in hits:
print(bag[“path”], bag[“health_score”])

Worked example: examples/06_index_search_query_dsl.py.

Two cross-bag surfaces are built in:

  • Cross-bag overlay — same topic on N bags, aligned by relative time, one trace per bag in the dashboard. Example: 14_cross_bag_overlay.py.
  • Multi-bag synchronized playback (MultiBagPlayback) — N bags streamed concurrently over the WS bridge with per-bag offsets; topics namespaced as <bag_id>: so PlotJuggler / the cross-bag UI handle it without protocol changes. Example: 20_multi_bag_playback.py.

What’s not a single built-in operation: “treat N bags as one concatenated stream.”

Today that’s about 5 lines composing search() with BagFrame.iter_chunks(). If that turns out to be a common pattern I’ll promote it to first-class.

-> Custom message types

The CDR parser auto-decodes the common ROS 2 stack into pandas-like rows: sensor_msgs/{Imu, Image, CompressedImage, LaserScan, PointCloud2, JointState} and tf2_msgs/TFMessage.

For anything else (e.g. your own my_pkg/msg/CustomThing) the message flows through with raw_data (CDR bytes) and schema_data (the .msg definition pulled from the MCAP) preserved on every Message.

So:

:white_check_mark: Works out of the box on custom types: scan, index, health, density, time-slice, sync (by timestamp), bridge streaming, MCAP-to-MCAP export, recording from the bridge

:cross_mark: Doesn’t work without you supplying a decoder: bf[“/my_topic”].to_polars() won’t auto-flatten into typed columns

The schema is preserved verbatim, so a generic CDR decoder via rclpy or rosbags can fill the gap when you need flat columns. The dispatcher where built-in decoders are registered is resurrector/ingest/parser.py — pluggable custom-type decoders are on the roadmap if there’s enough demand to prioritize.

Link to repo for reference

If you’re willing to share which custom types you’d want auto-decoded, that’s exactly the feedback that shapes priorities — happy to fast-track common ones.

Thanks