hi everyone,
I just joined the AIC challenge and going through the setup steps. My setup is failing at step2 # Inside the container, start the environment
/entrypoint.sh ground_truth:=false start_aic_engine:=true
I have also filed a github issue with all the details. has anybody else in the community who have rtx5090 with Driver Version: 575.57.08 CUDA Version: 12.9 faced this. any workarounds.
thanks for any pointers!
opened 03:24AM - 18 Apr 26 UTC
### Description
Following the [getting_started.md](https://github.com/intrinsic… -dev/aic/blob/main/docs/getting_started.md#setup-docker) setup on a system with an RTX 5090 and NVIDIA driver 575.57.08, the eval container's Gazebo GUI (`gz sim -g`) and the `gzserver` Sensors subsystem both crash on startup with an X11 `NV-GLX BadValue` error.
The failure reproduces outside Gazebo as well — `glxinfo -B` inside the container fails with the same error — which points to the container's NVIDIA userspace GL/GLX libraries being unable to negotiate with the 575-series kernel driver required for Blackwell (RTX 50xx) GPUs.
The existing troubleshooting doc calls out PyTorch compatibility for RTX 50xx (#379), but not the GL/rendering side. This may share a root cause with #436, but the symptom pattern (explicit `NV-GLX BadValue` vs SIGSEGV) and setup path (eval container vs source build) are different enough that I'm filing separately.
### Setup method
Docker + distrobox (eval container, per `getting_started.md`)
### OS
Ubuntu (Linux 6.14.0-37-generic)
### System specs
- **GPU:** NVIDIA GeForce RTX 5090
- **NVIDIA driver:** 575.57.08 (CUDA 12.9)
- **Container image:** `ghcr.io/intrinsic-dev/aic/aic_eval:latest`
- **Image digest:** `sha256:9aa2ffdbb946d38edde1bac7b5f02a44cfbea26e3b04a9c74e09f14c97472923`
- **Container manager:** Docker, via distrobox with `--nvidia`
### Steps to reproduce
```bash
export DBX_CONTAINER_MANAGER=docker
docker pull ghcr.io/intrinsic-dev/aic/aic_eval:latest
distrobox create -r --nvidia -i ghcr.io/intrinsic-dev/aic/aic_eval:latest aic_eval
distrobox enter -r aic_eval
# Inside the container:
/entrypoint.sh ground_truth:=false start_aic_engine:=true
```
Additional verification inside the container:
1. `nvidia-smi` **works** and correctly reports the RTX 5090 with driver 575.57.08.
2. `glxinfo -B` **fails** before printing any output with the same `NV-GLX BadValue` error seen inside Gazebo.
### Relevant log output
**`glxinfo -B` inside the container (no Gazebo involved):**
```shell
X Error of failed request: BadValue (integer parameter out of range for operation)
Major opcode of failed request: 156 (NV-GLX)
Minor opcode of failed request: 6 ()
Value in failed request: 0x0
Serial number of failed request: 96
Current serial number in output stream: 96
name of display: :1
```
**Gazebo GUI (`gz-4`) aborts creating its Qt OpenGL context:**
```shell
[gz-4] Failed to create OpenGL context for format QSurfaceFormat(version 2.0, options QFlags<QSurfaceFormat::FormatOption>(), depthBufferSize 24, redBufferSize -1, greenBufferSize -1, blueBufferSize -1, alphaBufferSize -1, stencilBufferSize 8, samples -1, swapBehavior QSurfaceFormat::DoubleBuffer, swapInterval 1, colorSpace QSurfaceFormat::DefaultColorSpace, profile QSurfaceFormat::NoProfile)
[gz-4] Stack trace (most recent call last):
[gz-4] #4 ... QSGRenderLoop::handleContextCreationFailure(QQuickWindow*)
[gz-4] #3 ... QMessageLogger::fatal(char const*, ...) const
[gz-4] #2 ... abort
[gz-4] #1 ... gsignal
[gz-4] #0 ... pthread_kill
[gz-4] Aborted (Signal sent by tkill() 372059 1000)
[ERROR] [gz-4]: process has died [pid 372059, exit code -6, cmd 'gz sim -g'].
```
**`component_container-3` (`gzserver` with `gz-rendering-ogre2` for sensors) then crashes with the same X error and heap corruption:**
```shell
[component_container-3] (2026-04-17 19:59:31.439) [debug] [Sensors.cc:953] Initialization needed
[component_container-3] (2026-04-17 19:59:31.439) [debug] [Sensors.cc:349] Initializing render context
[component_container-3] (2026-04-17 19:59:31.439) [info] [RenderEngineManager.cc:514] Loading plugin [gz-rendering-ogre2]
[component_container-3] X Error of failed request: BadValue (integer parameter out of range for operation)
[component_container-3] Major opcode of failed request: 156 (NV-GLX)
[component_container-3] Minor opcode of failed request: 6 ()
[component_container-3] Value in failed request: 0x0
[component_container-3] double free or corruption (fasttop)
[ERROR] [component_container-3]: process has died [pid 372058, exit code -6, cmd '/opt/ros/kilted/lib/rclcpp_components/component_container --ros-args -r __node:=ros_gz_container -r __ns:=/'].
```
**Downstream, spawners time out waiting on the controller manager:**
```shell
[spawner-7] [WARN] [1776481201.548971111] [spawner_joint_state_broadcaster]: Failed getting a result from calling /controller_manager/list_controllers in 30.0. (Attempt 1 of 3.)
[spawner-7] [WARN] [1776481231.549630175] [spawner_joint_state_broadcaster]: Failed getting a result from calling /controller_manager/list_controllers in 30.0. (Attempt 2 of 3.)
```
### Expected behavior
`/entrypoint.sh ground_truth:=false start_aic_engine:=true` brings up Gazebo with a working OpenGL context on RTX 50xx GPUs running NVIDIA driver 575.x.
### Likely root cause (according to Claude)
The container's NVIDIA userspace libraries (`libGLX_nvidia.so.0`, `libGL.so.1`, etc.) appear to predate driver 575 and can't handle GLX requests from the newer kernel module. The fact that `NV-GLX BadValue` fires on a plain `glxinfo -B` — with no Gazebo or OGRE in the loop — points at the container's GL/GLX userspace stack rather than anything Gazebo is doing.
Any inputs highly appreciated!!