Two Weeks of Camera Calibration: Turning Pixels Into Trust

No_Quarter_Robotics · April 2, 2026, 5:17pm

The Illusion of “Working”

At first glance, everything looked fine.

The cameras detected targets
The robot moved toward them
The system appeared to succeed

But underneath that…

Pixel centers drifted
Bounding boxes shifted
Depth assumptions broke down
Small errors compounded into failed insertions

Nothing was obviously wrong.

It was just… off.

Step 1 — Stop Treating Pixels Like Positions

Originally, I was doing what most systems do:

Take pixel offsets → scale them → move the robot

That works… until it doesn’t.

So I rebuilt the entire pipeline using a pinhole camera model.

Instead of treating pixels as distances, I treated them as angles:

dx_norm = (center_px[0] - (img_w / 2.0)) / (img_w / 2.0)
dy_norm = (center_px[1] - (img_h / 2.0)) / (img_h / 2.0)

angle_x = dx_norm * (HFOV_RAD / 2.0)
angle_y = dy_norm * (VFOV_RAD / 2.0)

Now the system wasn’t guessing anymore. It was projecting.

Step 2 — Turn Angles Into Real-World Coordinates

Once I had angles, I mapped them into physical space:

total_pitch_angle = PITCH_RAD - angle_y
off_y_m = HEIGHT_M * math.tan(total_pitch_angle)

slant_range = HEIGHT_M / math.cos(PITCH_RAD)
off_x_m = slant_range * math.tan(angle_x)

This gave me:

Real-world X/Y offsets
Based on camera height
Using actual geometry

Step 3 — Accept That Simulation Lies

Even with correct math… things were still off.

Because simulation isn’t perfect.

So instead of hacking each camera individually, I introduced a dynamic correction:

gazebo_scalar = self.get_intelligent_scalar(HEIGHT_M)
off_y_m *= gazebo_scalar

One adjustment layer.

No per-camera hacks.
No magic numbers everywhere.

Just controlled correction based on observed behavior.

Step 4 — Don’t Trust a Camera — Score It

This was one of the biggest shifts.

Instead of asking:

“Which camera sees the target?”

I started asking:

“Which camera should I trust?”

I built a multi-factor confidence score:

visual_score = base ** 1.5

error = max(0.0, abs(1.0 - mpp_ratio) - 0.1)
geo_score = max(0.0, 1.0 - (error ** 2))

tilt_penalty = 0.5 if abs(dy_norm) > 0.5 else 1.0

confidence_score = visual_score * geo_score * tilt_penalty

Each camera is evaluated on:

Visual alignment (how centered it is)
Geometric sanity (meters-per-pixel consistency)
Viewing angle (tilt penalty)

Now the system doesn’t just see.

It judges what it sees

and a massive log

        return {
            "x": calc_x,
            "y": calc_y,
            "visual_score": visual_score,
            "geo_score": geo_score,
            "tilt_penalty": tilt_penalty,
            "mpp_ratio": mpp_ratio,
            "confidence_score": confidence_score,
            "method": method_label,
            "debug": {
                "center_px": center_px,
                "img_size": [img_w, img_h],
                "dx_norm": dx_norm,
                "dy_norm": dy_norm,
                "vfov_deg": math.degrees(VFOV_RAD),
                "hfov_deg": math.degrees(HFOV_RAD),
                "pitch_deg": cfg.get("pitch_deg", 0.0),
                "yaw_deg": cfg.get("yaw_deg", 0.0),
                "tcp_yaw_deg": math.degrees(tcp_yaw),
                "angle_x_deg": math.degrees(angle_x),
                "angle_y_deg": math.degrees(angle_y),
                "height_m": HEIGHT_M,
                "slant_range": slant_range,
                "off_x_m": off_x_m,
                "off_y_m": off_y_m,
                "cam_base_xyz": [cam_base_x, cam_base_y, cam_base_z],
                "sc_obs": [sc_obs_x, sc_obs_y],
                "obs_err": [obs_err_x, obs_err_y],
                "delta": [delta_x, delta_y],
                "tcp_xyz": [tcp_x, tcp_y, tcp_z],
                "cam_offset_xyz": [cfg.get("off_x", 0.0), cfg.get("off_y", 0.0), cfg.get("off_z", 0.0)],
                "navigator_truth_xyz": [
                    nav_pos.get("x", 0.0),
                    nav_pos.get("y", 0.0),
                    nav_pos.get("z", 0.0),
                ],
                "target_truth_xyz": [
                    goal_pos.get("x", 0.0),
                    goal_pos.get("y", 0.0),
                    goal_pos.get("z", 0.0),
                ],
                "pitch_plus_angle_y_deg": math.degrees(PITCH_RAD - angle_y),
                "tan_pitch_plus_angle_y": math.tan(PITCH_RAD - angle_y),
                "total_yaw_deg": math.degrees(total_yaw),
                "rotated_offset_xy": [rot_off_x, rot_off_y],
                "rotated_projection_xy": [rot_x, rot_y],
                "dy_px": dy_px,
                "dx_px": dx_px,
                "meters_per_pixel_y": meters_per_pixel_y,
                "meters_per_pixel_x": meters_per_pixel_x,
                "gazebo_y_scalar": gazebo_scalar,
                "navigator_id": navigator_id,
            }
        }

Step 5 — Reconstruct the Target (Don’t Chase It)

Once the system detects the port, it doesn’t aim directly at it.

It reconstructs the real target using known offsets:

delta_x = goal_pos["x"] - nav_pos["x"]
delta_y = goal_pos["y"] - nav_pos["y"]

calc_x = sc_obs_x + delta_x
calc_y = sc_obs_y + delta_y

That’s a subtle but critical difference:

I’m not chasing pixels. I’m rebuilding the world.

Step 6 — Movement Isn’t Enough — You Need Convergence

At this point, I could compute a target.

But the robot still didn’t land perfectly.

Because real systems don’t work in one step.

So I built a closed-loop hover system:

Every iteration:

Capture a new observation
Compute error
Apply a correction
Move
Repeat

Measure Both Pixel and Truth Error

pixel_dist = math.sqrt(dx_norm**2 + dy_norm**2)

truth_dist = math.sqrt((calc_x - tx)**2 + (calc_y - ty)**2)

Pixel error tells you what the camera sees. Truth error tells you what actually happened.

You need both.

Track the Best Attempt

Even if the system doesn’t fully converge, it keeps the best result:

if score > best_score:
    best_lock = {...}

So failure becomes:

“Use the closest known good position”

Control the Movement (This Was Critical)

Originally, I had a bug where the robot kept making tiny fixed movements and never converged.

The fix was simple, but powerful:

Dynamically adjust how the move is executed:

if requested_xy > 0.010:
    xy_steps = 10
else:
    xy_steps = 1

Big corrections → smooth motion
Small corrections → precision control

What This Isn’t

Is this Machine Learning?
No.

Is this Reinforcement Learning?
Also no.

There’s no training loop.
No reward function.

What This Actually Is

This is:

Geometry
Calibration
Feedback control
Systems engineering

No_Quarter_Robotics · April 3, 2026, 7:44pm

Really expanding the CV model! Went from Discovery → Targeting → Verification

Hans_Kumar · April 6, 2026, 11:08pm

Did you have to calibrate the cameras? Or is there some default params for a gazebo camera we can use?

No_Quarter_Robotics · April 7, 2026, 12:04pm

Yes, I created a direct hover over sc port function and ran a 3x3 grid movement and captured a tremendous amount of data on each cam to debug.

Hans_Kumar · April 7, 2026, 1:25pm

Ahh okay. Did you have to simulate a checkerboard to estimate extrinsics / intrinsics? Im guessing these should be pretty accurate though right?

No_Quarter_Robotics · April 7, 2026, 1:55pm

yes, but you will be surprised on how bad the cams lie, due to offsets. I designed it not just for motion, but for targeting a safe hover inspection per task.

Topic		Replies	Views
New MoveIt Calibration Capabilities from Intel Manipulation	1	797	August 28, 2020
Demand for Highly-Calibrated Cameras — More Accurate Than OpenCV Industrial robotics optical-flow	6	412	July 31, 2025
Piper Hand-Eye Calibration Gazebo General	0	176	December 15, 2025
Opinion on Calibration Approaches ROS General	10	1240	December 25, 2022
What exactly does camera sensor pose designate? Gazebo General	3	768	March 16, 2021