Hello everyone,
I would like to share a research-oriented project focused on autonomous mobile robot navigation using deep reinforcement learning, developed and evaluated within a ROS2 Foxy and Gazebo Classic simulation pipeline, fully containerized using Docker to ensure reproducibility and deployment consistency.
Overview
This work formulates navigation as a continuous control and sequential decision-making problem, where a mobile robot learns an optimal policy directly through interaction with the environment rather than relying on classical model-based planners, predefined cost maps, or handcrafted heuristics.
The learning framework is implemented using the Soft Actor-Critic (SAC) algorithm, an off-policy, entropy-regularized actor–critic method designed for stability and robustness in continuous action spaces.
System Architecture
-
Algorithm: Soft Actor-Critic (SAC) with twin Q-networks and automatic entropy tuning
-
Simulation: Gazebo Classic
-
Middleware: ROS2 FoxyProcessing: image_2026-01-17_192455881.png…
-
Platform: TurtleBot3
-
Sensors: 2D LiDAR, IMU, wheel encoders
-
State Representation:
- Compressed LiDAR scan bins
- Goal-relative distance and orientation (cos/sin encoding)
- Previous action history
-
Control Outputs: Continuous linear and angular velocity commands
-
Deployment: Fully Dockerized training and evaluation pipeline
Training and Evaluation
- Off-policy training with experience replay and Bellman-consistent value updates
- Reward formulation balances:
- Goal-reaching efficiency
- Collision avoidance
- Trajectory smoothness
- Environments randomized across episodes to improve generalization and robustness
- Metrics logged using TensorBoard, including:
- Batch reward and averaged reward
- Actor entropy and entropy temperature (α)
- Actor and critic loss trends
- Goal success and collision rates during evaluation episodes
Results
- The learned policy demonstrates consistent goal-reaching behavior across structured and previously unseen obstacle configurations
- Collision rates decrease over training, indicating effective obstacle avoidance
- The navigation policy produces smooth, dynamically feasible trajectories without oscillatory or locally trapped behavior
- Entropy and α convergence confirm stable exploration–exploitation balance and robust policy optimization
Motivation for Sharing
I am interested in discussing:
- Best practices for stabilizing SAC in ROS-based navigation tasks
- Strategies for sim-to-real transfer in continuous control navigation
- Integration of multi-robot or decentralized RL frameworks within ROS2
