Has anyone tried conformal prediction for sensor gating in a nav stack?

This is a really interesting thread.

I agree that the painful part is not only “is the noise Gaussian?”, but also what score we use to decide whether a measurement or state transition is atypical.

In many EKF-based stacks, the natural nonconformity score for conformal prediction would be something like innovation Mahalanobis distance. That is convenient, but it also inherits the estimator’s covariance assumptions. If R or P is already mismatched because of multipath, canopy effects, VIO drift, delayed measurements, or linearization error, then the conformal layer may be calibrating on a biased score.

So my intuition is that conformal prediction could be very useful, but the nonconformity score should probably not be only the classical chi-squared / Mahalanobis gate.

In a related PX4/ROS project, I have been experimenting with this idea at the Offboard control boundary. I open-sourced a passive observer called AFIO — currently reframing the name as Autonomy Flight Integrity Observer — that watches:

/fmu/in/trajectory_setpoint
/fmu/in/offboard_control_mode
/fmu/out/vehicle_odometry

It computes a deterministic residual bundle:

setpointAgeMs
setpointJitterMs
staleStreams
positionTrackingResidual
velocityTrackingResidual
flightResidual

The idea is not to replace the EKF, and it is not an AI detector. It is a NARH-inspired boundary consistency score: does the incoming intent stream remain fresh, temporally consistent, and physically reflected in the vehicle response?

In controlled PX4/Gazebo SITL latency-injection tests, the residual stayed quiet for 0–80 ms injected delay, produced a consistent SETPOINT_JITTER warning around 150 ms, and reached STALE_STREAM / RESYNCING at 300 ms. The useful part was that the score captured timing / execution degradation even when simple spatial tracking error still looked fairly benign.

I think a conformal layer could sit on top of this kind of residual score:

calibration flight / nominal bag
    → collect residual scores
    → choose conformal quantile
    → online flag when new residual exceeds calibrated threshold

For GPS / localization specifically, a similar approach might combine innovation residuals with additional nonconformity terms such as measurement age, covariance consistency, inter-sensor disagreement, temporal burstiness, and odometry/IMU continuity checks.

One caveat: CP’s coverage guarantee still depends on the calibration and test data being sufficiently exchangeable. For outdoor robotics, that probably means rolling or adaptive conformal calibration rather than one static threshold for all environments.

I’d be very interested to hear whether anyone here has tried alternative nonconformity scores beyond EKF innovation Mahalanobis distance — especially for GPS multipath, canopy, or delayed VIO measurements.

https://discourse.openrobotics.org/t/release-ai-flight-integrity-observer-measuring-px4-offboard-degradation-under-controlled-ai-inference-lag/55530?u=zc_liu
AFIO repo, if useful for reference: https://github.com/ZC502/ai_flight_integrity_observer.git

1 Like