Autonomous Path Planning Using Deep Reinforcement Learning (ROS2 Foxy+ Gazebo Classic)

Hello everyone,

I would like to share a research-oriented project focused on autonomous mobile robot navigation using deep reinforcement learning, developed and evaluated within a ROS2 Foxy and Gazebo Classic simulation pipeline, fully containerized using Docker to ensure reproducibility and deployment consistency.

Overview

This work formulates navigation as a continuous control and sequential decision-making problem, where a mobile robot learns an optimal policy directly through interaction with the environment rather than relying on classical model-based planners, predefined cost maps, or handcrafted heuristics.

The learning framework is implemented using the Soft Actor-Critic (SAC) algorithm, an off-policy, entropy-regularized actor–critic method designed for stability and robustness in continuous action spaces.

System Architecture

  • Algorithm: Soft Actor-Critic (SAC) with twin Q-networks and automatic entropy tuning

  • Simulation: Gazebo Classic

  • Middleware: ROS2 FoxyProcessing: image_2026-01-17_192455881.png…

  • Platform: TurtleBot3

  • Sensors: 2D LiDAR, IMU, wheel encoders

  • State Representation:

    • Compressed LiDAR scan bins
    • Goal-relative distance and orientation (cos/sin encoding)
    • Previous action history
  • Control Outputs: Continuous linear and angular velocity commands

  • Deployment: Fully Dockerized training and evaluation pipeline

Training and Evaluation

  • Off-policy training with experience replay and Bellman-consistent value updates
  • Reward formulation balances:
    • Goal-reaching efficiency
    • Collision avoidance
    • Trajectory smoothness
  • Environments randomized across episodes to improve generalization and robustness
  • Metrics logged using TensorBoard, including:
    • Batch reward and averaged reward
    • Actor entropy and entropy temperature (α)
    • Actor and critic loss trends
    • Goal success and collision rates during evaluation episodes

Results

  • The learned policy demonstrates consistent goal-reaching behavior across structured and previously unseen obstacle configurations
  • Collision rates decrease over training, indicating effective obstacle avoidance
  • The navigation policy produces smooth, dynamically feasible trajectories without oscillatory or locally trapped behavior
  • Entropy and α convergence confirm stable exploration–exploitation balance and robust policy optimization

Motivation for Sharing

I am interested in discussing:

  • Best practices for stabilizing SAC in ROS-based navigation tasks
  • Strategies for sim-to-real transfer in continuous control navigation
  • Integration of multi-robot or decentralized RL frameworks within ROS2
1 Like