Requesting use-case ideas: dataset DataOps for rosbag→images→COCO/YOLO training

uptonow · January 26, 2026, 2:56am

Hi everyone — I’m looking for real-world application scenarios where a small “dataset DataOps” tool would be genuinely useful in robotics perception workflows.

I often see teams doing some version of: rosbag → images → COCO/YOLO → training (or sim → export → training). The painful part is usually not exporting once, but iterating safely:

“Which exact dataset did we train on?”
“Why did results change after a config tweak?”
“Did the dataset distribution drift?”
“Can we put dataset quality checks into CI?”

I’m building an open-source CLI called KomanSim that focuses on:

Reproducible dataset builds: each run writes a manifest.json with config/job hash + dataset hash
Exports: COCO + Ultralytics-style YOLO layout
QA outputs: basic dataset stats/sanity checks intended to be used like “quality gates”

Right now it’s intentionally lightweight: the included dummy backend is mainly for validating the pipeline/output contracts (images may be placeholders), not photorealism.

Quickstart (copy/paste)

git clone https://github.com/uptonow/KomanSim
cd KomanSim
pip install -e ".[dev]"

# COCO demo
komansim validate --config examples/configs/job_dummy_coco.yaml
komansim run --backend dummy --config examples/configs/job_dummy_coco.yaml

# YOLO demo (Ultralytics layout)
komansim validate --config examples/configs/job_dummy_yolo.yaml
komansim run --backend dummy --config examples/configs/job_dummy_yolo.yaml

What I’m asking from you

In your work, where does the workflow rosbag→images→COCO/YOLO break down the most?
Which “next feature” would you actually use?

A) Dataset Doctor: QA an existing COCO/YOLO dataset and output a report + suggested quality gates
B) Dataset Diff: compare dataset v1 vs v2 and report distribution/annotation differences (drift signals, missing labels, bbox size/occlusion changes)
C) ROS-friendly helpers: small utilities around frame export / camera_info calibration awareness / consistent naming

If you have 30 seconds: reply with one concrete scenario (even a single sentence), and optionally A/B/C.
If you have 2 minutes: what QA metrics or diff signals would you want first?

Repo: https://github.com/uptonow/KomanSim

Thanks

Topic		Replies	Views
Sample and validation data of Boston Dynamics Spot robot available online! ROS General	0	1687	May 11, 2021
Rosbag dataset hub ROS General data , rosbag	7	5045	June 11, 2021
New Synthetic Datasets for Industrial Bin Picking ROS General data	0	73	May 26, 2026
Synthetic data generation with Isaac SIM tutorial ROS General ros2 , tutorial , simulation , ai , gpu	0	4918	March 11, 2022
Manage Your ROS Data for Easy Discovery & Analysis ROS General	0	819	October 18, 2021

Requesting use-case ideas: dataset DataOps for rosbag→images→COCO/YOLO training

Related topics