Submissions fail at ~637s with empty stdout — header only, no engine logs

Hi all,

Every submission fails ~637s with empty stdout (just the eval header), empty stderr, result.json: . Same pattern every time.

Tried everything from @bha51’s thread ( Fixed submission failing on Portal ): heavy imports inside init, sim time everywhere, no wall clock, init under 60s, rebuilt
without buildx attestation, both URI formats, simplified entrypoint to match official aic_model, HF offline, no restricted namespaces, clean single-platform manifest via crane.

Local runs fine. Submissions go silent.

Strange part: we don’t even get aic_engine output in stdout, just the team/date header. Others reported missing only model logs but still got engine logs.

Anyone else seen this “header-only stdout” pattern?

Thanks!

1 Like

Are you using a heavy policy by any chance?

Hi @Praneet_Kedari, thanks for the question!

It’s moderate — ResNet18 regressor (~50 MB checkpoint), loaded in init on CPU, full init takes ~5s locally. Not particularly heavy. The heavy bits (torch, torchvision,
transforms3d, PIL) are all inside init, no top-level imports.

Did you hit something similar with a heavier policy? What was the symptom and how did you work around it?

Same as yours. Hope available logs after failures.

How did you ensure that the initialization time stayed under 60 seconds after submission?

It doesn’t look like we’ll be getting logs. Just try and error. Note that on the instance timing can be pretty different. We eventually got it to work by moving all the loading to task time (insert_cable), and now gradually moving back things to __init__ after gradually optimizing. That way we can see what broke it. It’s not nice to “stab in the dark” but it seems that’s what we need to do…

If you’re loading policies and it takes more than 60s then you’ll have to optimize load time, there’s no other way to “ensure” it stays under 60, I believe.

3 Likes

Thank you for your sharing!

Hey same as yours, just about 30 minutes after your post.