Hephaes: Open-source ROS1/2 Logs to Parquet/TFRecord converter

Hi everyone!

Over the past little while a friend and I have been building an open-source Python package called Hephaes. The main goal right now is to make it easier to convert robotics logs (specifically bag/mcap files) into formats that are easier to work with downstream, like:

  • Parquet for easier analysis and data processing
  • TFRecord for TensorFlow-compatible ML pipelines

At the moment, each conversion also generates a manifest.json file, which we’re using as a basic indexing layer to keep track of outputs and metadata. We’re planning to expand it further with features like:

  • VLM-based tagging for richer dataset annotation
  • support for streaming directly from ROS and converting live data, instead of only working from recorded logs

We’re still early in the process and mainly building this because we wanted a simpler workflow for working with robotics data, but we’d genuinely love feedback from people who have worked with robotics pipelines before. A few things we’d especially love thoughts on:

  • Does this solve a real pain point for your workflow?
  • Are Parquet / TFRecord useful targets, or are there other formats we should prioritize?
  • What metadata would you want included in the manifest/index?
  • Would live ROS stream conversion actually be useful?

Happy to share the repo and hear any blunt feedback. We’re trying to learn and build something genuinely useful.

Github: GitHub - hephaes-ai/hephaes: Easily convert ROS logs into standardized datasets · GitHub

Pypi: hephaes · PyPI