Two Weeks of Camera Calibration: Turning Pixels Into Trust

The Illusion of “Working”

At first glance, everything looked fine.

  • The cameras detected targets

  • The robot moved toward them

  • The system appeared to succeed

But underneath that…

  • Pixel centers drifted

  • Bounding boxes shifted

  • Depth assumptions broke down

  • Small errors compounded into failed insertions

Nothing was obviously wrong.

It was just… off.

Step 1 — Stop Treating Pixels Like Positions

Originally, I was doing what most systems do:

Take pixel offsets → scale them → move the robot

That works… until it doesn’t.

So I rebuilt the entire pipeline using a pinhole camera model.

Instead of treating pixels as distances, I treated them as angles:

dx_norm = (center_px[0] - (img_w / 2.0)) / (img_w / 2.0)
dy_norm = (center_px[1] - (img_h / 2.0)) / (img_h / 2.0)

angle_x = dx_norm * (HFOV_RAD / 2.0)
angle_y = dy_norm * (VFOV_RAD / 2.0)

Now the system wasn’t guessing anymore. It was projecting.

Step 2 — Turn Angles Into Real-World Coordinates

Once I had angles, I mapped them into physical space:

total_pitch_angle = PITCH_RAD - angle_y
off_y_m = HEIGHT_M * math.tan(total_pitch_angle)

slant_range = HEIGHT_M / math.cos(PITCH_RAD)
off_x_m = slant_range * math.tan(angle_x)

This gave me:

  • Real-world X/Y offsets

  • Based on camera height

  • Using actual geometry

Step 3 — Accept That Simulation Lies

Even with correct math… things were still off.

Because simulation isn’t perfect.

So instead of hacking each camera individually, I introduced a dynamic correction:

gazebo_scalar = self.get_intelligent_scalar(HEIGHT_M)
off_y_m *= gazebo_scalar

One adjustment layer.

No per-camera hacks.
No magic numbers everywhere.

Just controlled correction based on observed behavior.

Step 4 — Don’t Trust a Camera — Score It

This was one of the biggest shifts.

Instead of asking:

“Which camera sees the target?”

I started asking:

“Which camera should I trust?”

I built a multi-factor confidence score:

visual_score = base ** 1.5

error = max(0.0, abs(1.0 - mpp_ratio) - 0.1)
geo_score = max(0.0, 1.0 - (error ** 2))

tilt_penalty = 0.5 if abs(dy_norm) > 0.5 else 1.0

confidence_score = visual_score * geo_score * tilt_penalty

Each camera is evaluated on:

  • Visual alignment (how centered it is)

  • Geometric sanity (meters-per-pixel consistency)

  • Viewing angle (tilt penalty)

Now the system doesn’t just see.

It judges what it sees

and a massive log

        return {
            "x": calc_x,
            "y": calc_y,
            "visual_score": visual_score,
            "geo_score": geo_score,
            "tilt_penalty": tilt_penalty,
            "mpp_ratio": mpp_ratio,
            "confidence_score": confidence_score,
            "method": method_label,
            "debug": {
                "center_px": center_px,
                "img_size": [img_w, img_h],
                "dx_norm": dx_norm,
                "dy_norm": dy_norm,
                "vfov_deg": math.degrees(VFOV_RAD),
                "hfov_deg": math.degrees(HFOV_RAD),
                "pitch_deg": cfg.get("pitch_deg", 0.0),
                "yaw_deg": cfg.get("yaw_deg", 0.0),
                "tcp_yaw_deg": math.degrees(tcp_yaw),
                "angle_x_deg": math.degrees(angle_x),
                "angle_y_deg": math.degrees(angle_y),
                "height_m": HEIGHT_M,
                "slant_range": slant_range,
                "off_x_m": off_x_m,
                "off_y_m": off_y_m,
                "cam_base_xyz": [cam_base_x, cam_base_y, cam_base_z],
                "sc_obs": [sc_obs_x, sc_obs_y],
                "obs_err": [obs_err_x, obs_err_y],
                "delta": [delta_x, delta_y],
                "tcp_xyz": [tcp_x, tcp_y, tcp_z],
                "cam_offset_xyz": [cfg.get("off_x", 0.0), cfg.get("off_y", 0.0), cfg.get("off_z", 0.0)],
                "navigator_truth_xyz": [
                    nav_pos.get("x", 0.0),
                    nav_pos.get("y", 0.0),
                    nav_pos.get("z", 0.0),
                ],
                "target_truth_xyz": [
                    goal_pos.get("x", 0.0),
                    goal_pos.get("y", 0.0),
                    goal_pos.get("z", 0.0),
                ],
                "pitch_plus_angle_y_deg": math.degrees(PITCH_RAD - angle_y),
                "tan_pitch_plus_angle_y": math.tan(PITCH_RAD - angle_y),
                "total_yaw_deg": math.degrees(total_yaw),
                "rotated_offset_xy": [rot_off_x, rot_off_y],
                "rotated_projection_xy": [rot_x, rot_y],
                "dy_px": dy_px,
                "dx_px": dx_px,
                "meters_per_pixel_y": meters_per_pixel_y,
                "meters_per_pixel_x": meters_per_pixel_x,
                "gazebo_y_scalar": gazebo_scalar,
                "navigator_id": navigator_id,
            }
        }

Step 5 — Reconstruct the Target (Don’t Chase It)

Once the system detects the port, it doesn’t aim directly at it.

It reconstructs the real target using known offsets:

delta_x = goal_pos["x"] - nav_pos["x"]
delta_y = goal_pos["y"] - nav_pos["y"]

calc_x = sc_obs_x + delta_x
calc_y = sc_obs_y + delta_y

That’s a subtle but critical difference:

I’m not chasing pixels. I’m rebuilding the world.

Step 6 — Movement Isn’t Enough — You Need Convergence

At this point, I could compute a target.

But the robot still didn’t land perfectly.

Because real systems don’t work in one step.

So I built a closed-loop hover system:

Every iteration:

  1. Capture a new observation

  2. Compute error

  3. Apply a correction

  4. Move

  5. Repeat

Measure Both Pixel and Truth Error


pixel_dist = math.sqrt(dx_norm**2 + dy_norm**2)

truth_dist = math.sqrt((calc_x - tx)**2 + (calc_y - ty)**2)

Pixel error tells you what the camera sees. Truth error tells you what actually happened.

You need both.

Track the Best Attempt

Even if the system doesn’t fully converge, it keeps the best result:

if score > best_score:
    best_lock = {...}

So failure becomes:

“Use the closest known good position”

Control the Movement (This Was Critical)

Originally, I had a bug where the robot kept making tiny fixed movements and never converged.

The fix was simple, but powerful:

Dynamically adjust how the move is executed:

if requested_xy > 0.010:
    xy_steps = 10
else:
    xy_steps = 1

Big corrections → smooth motion
Small corrections → precision control

What This Isn’t

Is this Machine Learning?
No.

Is this Reinforcement Learning?
Also no.

There’s no training loop.
No reward function.


What This Actually Is

This is:

  • Geometry

  • Calibration

  • Feedback control

  • Systems engineering

2 Likes

Really expanding the CV model! Went from Discovery → Targeting → Verification

Did you have to calibrate the cameras? Or is there some default params for a gazebo camera we can use?

1 Like

Yes, I created a direct hover over sc port function and ran a 3x3 grid movement and captured a tremendous amount of data on each cam to debug.

Ahh okay. Did you have to simulate a checkerboard to estimate extrinsics / intrinsics? Im guessing these should be pretty accurate though right?

yes, but you will be surprised on how bad the cams lie, due to offsets. I designed it not just for motion, but for targeting a safe hover inspection per task.