Learnings from the qualification phase and what next?

Robin_Tomar · May 16, 2026, 3:55am

Hello people

It has been awesome 3 weeks since I have been working on the AIC challenge.

I started with similar approach to a lot of other folks by recording trajectories using cheatcode policy in gazebo converting them to lerobot format and training act, smolvla with them but I could not get any results. The maximum I could score with a smolVLA based policy was 40.

Now since the qualification phase is over, and while a lot of great teams will focus on the next stage, I want to dig deep and understand what could have been done better, the data, the policy, the strategy or whatever to improve results.

I’ll share details of what I have been doing and would love to get insights from fellow participants. I have pushed the dataset I recorded on huggingface, can be found here.

I had a ubuntu 22 based system so created a docker based container for development in ubuntu 24 and ros kilted, the code for which can be found here.

Now If teams who actually were able to score good it would be great if they could share some insights as well but only the things which don’t affect their competition.

The questions I am looking answers for

How much data is enough data to get decent results ? I had only ~550 episodes of data which was generated on random configuration of everything that could have been changed on the board, I feel the diversity is too much to be able to capture into this small dataset but not sure about what is a good approach to capture this diversity into the dataset.
- Vary the diversity slowly and record multiple episodes for each one of them?
- Just generate enough random configurations, no need to record multiple episodes for same config?
- Any other insight about data ?
How much the smoothness of your recorded trajectories affected your results? Initially i was optimizing for speed and time so the recorder script was moving everything fast which made jerks in the recorded trajectories but I am not sure how much will that affect the performace, I tried with a smoother version of cheatcode policy but could not record a lot of episodes with that and could not figure out if that actually had any impact or not.
Another interesting thing I observed a lot was that my policy was sort of oscillating during the inference, as in producing similar chunks on every inference. I am not sure if it was a bug somewhere on my side or other people have observed it too, or is it some sort of common failure mode in these policies caused by some specific conditions ?
I tried Act, smolVLA only did people tried out other things and what worked better than these ones?
Due to lack of time and resource I could not get into issac sim stuff but has anyone got good results by recording and training in issac sim and then going to gazebo?
What about the observation space? Initially i pushed everything in observation msg into the observation spcace of the policy but probably that is not the best idea because some of those things can be inferred from each other, so I tried bunch of different configurations but couldn’t find anything conclusive. So if anyone has insights in that space please do share.
What about Action space, both policies I worked on tried to predict target pose of the robot not sure if target velocity would have been a better approach but share your thoughts on that.
The rate at which to collect the training data. All of my datasets were recorded at 20Hz but Not sure If that rate has anything to do with how the policy performs?

I hope this gives people a chance to share whatever they have learned and help others.

Thankyou.

blue_dot · May 16, 2026, 4:31am

Thanks for the post, @Robin_Tomar , Initially I started directly with IsaacLab RL training. I spend lot of time making gazebo and IsaacLab equivalent in terms of physics , assets, controllers. Once the setup was equivalent I went with RL training. My goal was to solve for insertion using RL. I spent significant amout of time but it didn’t work out. I gave up on RL.

Then i switched to making a classical stack. I made a fairly complex stack with lot heuristics. With the classical stack I was getting about 80% success rate on sfp insertion and 0% on sc port. Used this vision aware classical stack (no gt) to collect trajectory data for training ACT. I collected around 300 trajectories , trained ACT but results were not good enough. Got similar score as yours around the range of ~75 - mostly poximity scoring. I further tried to collect teleop data to introduce further diversity because KL loss seemed to collapse a lot with classically collected data. Wasn’t able to make progress on ACT or VLA’s beyond this state

jlamperez · May 16, 2026, 9:07am

Hi @Robin_Tomar

Here I’ve posted my learnings in case they might be of interest to the community.

Best regards,

Topic		Replies	Views
My first Results: PI0.5 VLA Policy AI for Industry Challenge	6	744	May 16, 2026
Rocky's Open-Source Build Thread (AI for Industry Challenge) AI for Industry Challenge	35	2528	April 24, 2026
Leader board Low Scores AI for Industry Challenge	3	382	April 21, 2026
Fresh everything using defaults & a blank policy, still fails AI for Industry Challenge	0	66	May 13, 2026
Hardware setup used for the leaderboard evaluation AI for Industry Challenge	1	134	April 1, 2026

Learnings from the qualification phase and what next?

Related topics