Hello people ![]()
It has been awesome 3 weeks since I have been working on the AIC challenge.
I started with similar approach to a lot of other folks by recording trajectories using cheatcode policy in gazebo converting them to lerobot format and training act, smolvla with them but I could not get any results. The maximum I could score with a smolVLA based policy was 40.
Now since the qualification phase is over, and while a lot of great teams will focus on the next stage, I want to dig deep and understand what could have been done better, the data, the policy, the strategy or whatever to improve results.
I’ll share details of what I have been doing and would love to get insights from fellow participants. I have pushed the dataset I recorded on huggingface, can be found here.
I had a ubuntu 22 based system so created a docker based container for development in ubuntu 24 and ros kilted, the code for which can be found here.
Now If teams who actually were able to score good it would be great if they could share some insights as well but only the things which don’t affect their competition.
The questions I am looking answers for
- How much data is enough data to get decent results ? I had only ~550 episodes of data which was generated on random configuration of everything that could have been changed on the board, I feel the diversity is too much to be able to capture into this small dataset but not sure about what is a good approach to capture this diversity into the dataset.
- Vary the diversity slowly and record multiple episodes for each one of them?
- Just generate enough random configurations, no need to record multiple episodes for same config?
- Any other insight about data ?
- How much the smoothness of your recorded trajectories affected your results? Initially i was optimizing for speed and time so the recorder script was moving everything fast which made jerks in the recorded trajectories but I am not sure how much will that affect the performace, I tried with a smoother version of cheatcode policy but could not record a lot of episodes with that and could not figure out if that actually had any impact or not.
- Another interesting thing I observed a lot was that my policy was sort of oscillating during the inference, as in producing similar chunks on every inference. I am not sure if it was a bug somewhere on my side or other people have observed it too, or is it some sort of common failure mode in these policies caused by some specific conditions ?
- I tried Act, smolVLA only did people tried out other things and what worked better than these ones?
- Due to lack of time and resource I could not get into issac sim stuff but has anyone got good results by recording and training in issac sim and then going to gazebo?
- What about the observation space? Initially i pushed everything in observation msg into the observation spcace of the policy but probably that is not the best idea because some of those things can be inferred from each other, so I tried bunch of different configurations but couldn’t find anything conclusive. So if anyone has insights in that space please do share.
- What about Action space, both policies I worked on tried to predict target pose of the robot not sure if target velocity would have been a better approach but share your thoughts on that.
- The rate at which to collect the training data. All of my datasets were recorded at 20Hz but Not sure If that rate has anything to do with how the policy performs?
I hope this gives people a chance to share whatever they have learned and help others.
Thankyou.
