Hey!! thanks for taking the time to go through this so carefully. You clearly know the robot_localization codebase inside out, and I would much rather have corrections like this out in the open than leave something inaccurate sitting in a pinned post. I will go through each point as honestly as I can.
On covariance matrices
You are right. I went back to the source and found this in gpsFixCallback:
for (size_t i = 0; i < POSITION_SIZE; i++) {
for (size_t j = 0; j < POSITION_SIZE; j++) {
latest_cartesian_covariance_(i, j) =
msg->position_covariance[POSITION_SIZE * i + j];
}
}
So, RL is reading the GPS reported covariance and rotating it into the world frame before the Kalman update. My original claim was just wrong. I have updated the README and docs to remove that.
On ECEF
I want to be a bit careful here because I think there is a real distinction.
I looked at the PR that got merged. It adds a broadcast_earth_transform parameter that publishes an earth to world TF frame from navsat_transform. That is genuinely useful for REP 105 compliance and I think it is a solid addition.
But it is not quite what I meant when I said FusionCore handles GPS in ECEF. In FusionCore, the UKF state itself runs in ECEF coordinates. The lat lon alt from NavSatFix gets converted to ECEF XYZ once, and then everything happens there. The innovation, the covariance update, the state correction, all of it. There is no UTM projection anywhere in the pipeline.
In RL navsat_transform, even with that PR, the GPS measurement still goes through GeographicLib UTMUPS Forward or gps_local_cartesian Forward before it ever reaches the filter. The earth frame gets published at the end, not used internally for fusion. If I am misunderstanding that part of the code I am very open to being corrected.
Also worth mentioning, that PR is only on rolling devel right now and has not been ported to Jazzy yet, which is where most of my users are.
On the NaN divergences
This is the part I am most interested in discussing.
The RL UKF results I saw in my benchmark were not a config issue. I used the default configuration on the same data and got NaN on all six sequences. I want to explain what I think might be going on, because it seems tied to a deeper architectural difference.
Looking at navsat_transform, I do not see any chi squared gating. RL takes the GPS covariance and feeds it into the Kalman update, which then weights the measurement. That is correct Kalman behavior. But even if something gets down weighted, it can still be an outlier. It is not rejected, just blended in.
FusionCore runs a chi squared test on every sensor update before it touches the filter state. If the Mahalanobis distance is too large for that sensor, the update just gets dropped.
On the November 2012 sequence I mentioned earlier, this actually backfired. GPS degraded for a long stretch, so FusionCore kept rejecting updates and the state drifted. But on the other five sequences, that same gating is what kept bad fixes from corrupting the filter.
My current hypothesis for the RL UKF NaN is something like this. A bad measurement gets fused, even if it is low weight. That nudges the state. The next measurement now looks like a bigger innovation, it also gets fused, and the errors start to stack. Eventually the UKF runs into a numerical issue and you get a singular matrix somewhere along the line.
I would be really interested to hear if you have seen something like that before or if you think there is a different root cause.
What I think actually differs
Based on your corrections, this is my updated view of what FusionCore does differently, not better or worse.
The filter has 22 states. Gyro bias and accelerometer bias are estimated continuously as part of the state, not handled afterward. Over long runs with temperature changes, that makes a difference.
There is per sensor chi squared gating. IMU, wheel odometry, and GPS each have their own thresholds based on their degrees of freedom. RL uses a single global Mahalanobis threshold.
Noise covariance is adapted online from the innovation sequence. In RL it comes from YAML config. In FusionCore it gets updated based on what the sensors are actually doing.
There is ZUPT and an inertial coast mode. As far as I know those are not in RL.
It is a single node setup, no navsat_transform needed. navsat_transform subscribes to odometry filtered to initialize its UTM to world transform, which means the filter has to be running before GPS can come in. The startup ordering issues people hit seem to come from that design rather than misconfiguration.
What robot_localization has that FusionCore does not
Multi robot support, differential and absolute modes, a much bigger user base, and a lot of real world validation over time. And honestly, people like you who know the code well enough to call out mistakes publicly. That part is genuinely valuable.
Thanks again for engaging with this seriously. If anything above is factually wrong I want to know.