Update: TensorRT Optimization, 7x Performance Improvement Over Previous PyTorch Release!
Great news for everyone following this project! We’ve successfully implemented TensorRT 10.3 acceleration, and the results are significant:
Performance Improvement
| Metric | Before (PyTorch) | After (TensorRT) | Improvement |
|---|---|---|---|
| FPS | 6.35 | 43+ | 6.8x faster |
| Inference Time | 153ms | ~23ms | 6.6x faster |
| GPU Utilization | 35-69% | 85%+ | More efficient |
Test Platform: Jetson Orin NX 16GB (Seeed reComputer J4012), JetPack 6.2, TensorRT 10.3
Key Technical Achievement: Host-Container Split Architecture
We solved a significant Jetson deployment challenge - TensorRT Python bindings are broken in current Jetson container images (dusty-nv/jetson-containers#714). Our solution:
HOST (JetPack 6.x)
+--------------------------------------------------+
| TRT Inference Service (trt_inference_shm.py) |
| - TensorRT 10.3, ~15ms inference |
+--------------------------------------------------+
↑
| /dev/shm/da3 (shared memory, ~8ms IPC)
↓
+--------------------------------------------------+
| Docker Container (ROS2 Humble) |
| - Camera drivers, depth publisher |
+--------------------------------------------------+
This architecture enables real-time TensorRT inference while keeping ROS2 in a clean container environment.
One-Click Demo
git clone https://github.com/GerdsenAI/GerdsenAI-Depth-Anything-3-ROS2-Wrapper.git
cd GerdsenAI-Depth-Anything-3-ROS2-Wrapper
./run.sh
First run takes ~15-20 minutes (Docker build + TensorRT engine). Subsequent runs start in ~10 seconds.
Compared to Other Implementations
We’re aware of ika-rwth-aachen/ros2-depth-anything-v3-trt which achieves 50 FPS on desktop RTX 6000. Our focus is different:
- Embedded-first: Optimized for Jetson deployment challenges
- Container-friendly: Works around broken TRT bindings in Jetson images
- Production-ready: One-click deployment, auto-dependency installation
Call for Contributors
We’re looking for help with:
- Test coverage for SharedMemory/TensorRT code paths
- Validation on other Jetson platforms (AGX Orin, Orin Nano)
- Point cloud generation (currently depth-only)
Repo: GitHub - GerdsenAI/GerdsenAI-Depth-Anything-3-ROS2-Wrapper: ROS2 wrapper for Depth Anything 3 (https://github.com/ByteDance-Seed/Depth-Anything-3)
License: MIT
@Phocidae @AljazJus - the TensorRT optimization should help significantly with your projects! Let me know if you run into any issues.