Hi everyone!
Over the past little while a friend and I have been building an open-source Python package called Hephaes. The main goal right now is to make it easier to convert robotics logs (specifically bag/mcap files) into formats that are easier to work with downstream, like:
- Parquet for easier analysis and data processing
- TFRecord for TensorFlow-compatible ML pipelines
At the moment, each conversion also generates a manifest.json file, which we’re using as a basic indexing layer to keep track of outputs and metadata. We’re planning to expand it further with features like:
- VLM-based tagging for richer dataset annotation
- support for streaming directly from ROS and converting live data, instead of only working from recorded logs
We’re still early in the process and mainly building this because we wanted a simpler workflow for working with robotics data, but we’d genuinely love feedback from people who have worked with robotics pipelines before. A few things we’d especially love thoughts on:
- Does this solve a real pain point for your workflow?
- Are Parquet / TFRecord useful targets, or are there other formats we should prioritize?
- What metadata would you want included in the manifest/index?
- Would live ROS stream conversion actually be useful?
Happy to share the repo and hear any blunt feedback. We’re trying to learn and build something genuinely useful.
Github: GitHub - hephaes-ai/hephaes: Easily convert ROS logs into standardized datasets · GitHub
Pypi: hephaes · PyPI