Classical visual servoing

Why does visual servoing matter for aerial robotics?

For a drone to land on a moving platform, grasp an object, or inspect a turbine blade up close, GPS alone is not enough. The solution lies in visual servoing, a technique that uses camera feedback to directly control a robot’s motion. It closes the loop from pixels to propellers, enabling precision maneuvers that are impossible with open-loop control.

The current landscape

Classical visual servoing methods, such as Image-Based Visual Servoing (IBVS) and Position-Based Visual Servoing (PBVS), have been extensively studied for decades, primarily in robotic manipulators and ground vehicles. More recently, deep learning-based approaches have emerged, enhancing robustness through learned features and adaptive policies.

However, most introductory materials and tutorials still focus on ground robots or fixed-base arms. This creates a significant gap for aerial platforms, which operate under unique constraints: limited payload, high-speed dynamics, underactuation, and safety-critical flight requirements.

While Vision-Language-Action (VLA) models represent exciting modern advancements, they fall beyond the scope of classical visual servoing techniques.

Comparative Analysis of IBVS and PBVS

The table below systematically compares IBVS and PBVS across several critical axes.

Feature Image-Based Visual Servoing (IBVS) Position-Based Visual Servoing (PBVS)
Control Basis The controller’s error signal is calculated in the 2D image plane using pixel coordinates and derived features. The controller’s error signal is calculated in 3D Cartesian space, requiring the target’s full 3D pose.
Methodology A control law directly maps the error between current and desired image features to robot velocity commands, bypassing explicit pose estimation. A two-step process: 1) Estimates the 3D pose of the target from 2D image features. 2) Sends this 3D pose error to the robot’s controller.
Pros & Strengths Calibration Robustness: Tolerant to camera calibration errors, as it does not need metric 3D information.
Computational Efficiency: Skips 3D reconstruction, allowing for potentially faster control loops.
Intuitive Control: Robot motion in 3D space is natural to design and plan.
Simplified Design: The controller design is more straightforward for Cartesian-space objectives.
Cons & Weaknesses Image Singularities: Image Jacobian can become singular, causing unpredictable motion.
Trajectory Issues: Can lead to unexpected robot trajectories, including the counter-intuitive “camera retreat”.
FoV Constraint: Limited field of view is a significant real-world challenge, risking target loss in dynamic maneuvers.
Pose Estimation: Highly dependent on accurate pose estimation, which can be a bottleneck.
Error Sensitivity: Very sensitive to camera calibration and image noise, which degrades final positioning accuracy.

Problems encountered in visual servoing

Visual Perception

Common Simulator Assumption (Perfect World) Real‑World Challenge (The Messy Reality) Why It Breaks Visual Servoing
Ideal camera with uniform response across the entire image. Lens Shading (Vignetting): Real cameras suffer from optical and mechanical vignetting, where light intensity falls off toward the image periphery. Additionally, lenses are rarely perfectly tuned for the environment, and small miscalibrations in gain, exposure, or focus shift the actual feature coordinates from their expected values. The core error signal of any visual servoing controller is derived from pixel coordinates. Lens shading and miscalibration systematically shift these coordinates, introducing a steady-state error that prevents precise positioning also making it difficult for CV based detection models to detect identify and track.
Perfect, noise-free, high‑frame‑rate images; unlimited field of view (FoV). Limited FoV & Feature Loss: Onboard cameras have a constrained field of view. During agile maneuvers, especially with an underactuated drone that cannot translate without rotating, the target can be lost mid‑flight. Feature loss breaks the control loop entirely. If the visual error signal cannot be computed, the controller fails. Advanced schemes like MPC with visual penalty terms or online trajectory replanning are required to maintain visibility.
Vibration-free platform; instantaneous shutter. Vibrations & Motion Blur: Propeller imbalance, motor noise, and aerodynamic forces induce high-frequency vibrations. These cause motion blur and rolling shutter artifacts, degrading feature detection and tracking. Even with auto focus cameras it becomes a difficult task for detecting and tracking. Blurred images lead to imprecise or incorrect feature extraction for CV models. The controller then acts on erroneous visual information, causing oscillations or divergence leading to imminent crashes.
Static, well-lit environment with uniform, predictable lighting. Dynamic Lighting & Shadows: Real-world lighting changes constantly (sun angle, cloud cover). Shadows move and can be incorrectly identified as visual features, causing the controller to chase spurious targets. Feature tracking algorithms fail under dynamic illumination. The apparent motion of shadows can be interpreted as target motion, corrupting the error signal and causing the drone to drift or oscillate.

Environment

Common Simulator Assumption (Perfect World) Real‑World Challenge (The Messy Reality) Why It Breaks Visual Servoing
Calm, static atmospheric conditions. Wind & Gusts: Unpredictable wind disturbances affect both flight dynamics and image quality. Strong gusts push the drone off its intended trajectory and may tilt the camera, causing abrupt feature jumps in the image plane. Visual servoing controllers rely on precise image feature velocities. Wind-induced motion adds unmodeled disturbance, causing the controller to lag or react incorrectly.
Pristine visual conditions; no precipitation or obscurants. Suboptimal Weather: Rain, fog, mist, and smoke directly affect visual data quality by attenuating, scattering, or obscuring light. Feature detection becomes unreliable or impossible in degraded visual conditions. Many algorithms that work in clear sim environments fail outright in light rain or haze.
Static environment with no unexpected obstacles. Dynamic Objects & Obstacles: Real-world environments contain moving people, vehicles, and other dynamic obstacles that can occlude the target or enter the field of view unexpectedly. The controller may lock onto the wrong object, or collision avoidance behaviors may conflict with visual servoing commands.

Robot Dynamics & Control

Common Simulator Assumption (Perfect World) Real‑World Challenge (The Messy Reality) Why It Breaks Visual Servoing
Ideal 6‑DOF motion; decoupled translation and rotation. Underactuation & Coupled Dynamics: A quadrotor is underactuated—it has only 4 actuators to control 6 degrees of freedom. Pitch and thrust are interlinked, as are roll and yaw. Changing one independently is difficult; both must be coordinated smoothly for stable flight. The controller’s desired motion in the image plane may not be directly achievable. For example, lateral motion requires tilting the drone, which rotates the camera and changes the visual features in ways the controller may not anticipate.
Perfect actuator response; no delays. Sensor & Actuator Latency: Real-world systems have unavoidable delays: camera exposure, image transmission, feature extraction, control computation, and motor response. These delays accumulate. The visual servoing loop becomes unstable if the delay is not explicitly modeled. The controller acts on outdated visual information, leading to oscillations or divergence.
Simple, decoupled dynamics; no unmodeled effects. Aerodynamic & Ground Effects: Near the ground, propeller downwash creates complex airflow patterns (ground effect). In confined spaces, walls and ceilings induce additional aerodynamic disturbances. The system behaves differently than the model used to design the controller. Visual servoing for landing or close-quarters inspection fails unless the controller is robust to these effects.

Really helpful GitHub written with mavros to get started with image based visual servoing.

Could you clarify how much of your post is human work an how much is AI?

Users of this forum have agreed that we don’t want undisclosed AI posts where AI produced the majority of the content without a human added value.

Thanks for raising this, I respect the forum’s stance on undisclosed AI content.

To be transparent: yes, I used AI to help write and structure the post. But the substance comes from my own hands‑on work. I recently developed an interception program as well as a precision landing program on moving platforms. I ran into a set of real, non‑obvious problems that only become apparent when one is actually building and testing such a system.

The core insight and practical value are mine, having all the problems in one place helps others to realise what the project would entail once you start building it.

1 Like