variance_measure_add: skip non-finite input instead of asserting#351
Merged
bl4ckb0ne merged 1 commit intocollabora:masterfrom May 8, 2026
Merged
Conversation
bl4ckb0ne
reviewed
Mar 13, 2026
An assert(isfinite(d[i])) crash here corrupts the on-disk libsurvive config because the process dies mid-write. Corrupt optical angles (e.g. bad FPGA timestamps during USB disturbances) can produce NaN or Inf values that reach this function; crashing is strictly worse than dropping one sample, which has negligible effect on the variance estimate. Replace the assert with an early-return guard that logs to stderr and leaves the accumulator unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
00b698c to
1b7da16
Compare
bl4ckb0ne
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
variance_measure_addcontains anassert(isfinite(d[i]))inside the accumulation loop. When corrupt optical angle data reaches this function, the assert fires, crashes the process withSIGABRT, and corrupts the on-diskconfig.jsonbecause the file is left in a partially-written state. The next launch then fails to parse config and crashes again — a boot loop.Dropping one non-finite sample has negligible effect on the variance estimate. Crashing and corrupting config does not.
Demonstration
Hardware trigger: a USB disturbance (cable flex, port brown-out) causes the FPGA to emit bad timestamps. The driver produces a non-finite optical angle. That angle propagates into
variance_tracker_add→variance_measure_add:BEFORE fix:
d[0] = NaN
assert(isfinite(NaN)) → fires
Process: SIGABRT
config.json: partially written, unparseable on next launch
Result: crash loop until config.json is manually deleted
AFTER fix:
d[0] = NaN
isfinite(NaN) == false → early return, measurement skipped
stderr: "[libsurvive] variance_measure_add: non-finite d[0]=nan, dropping measurement"
Process: continues normally
Result: one sample dropped, variance estimate unaffected
The assert was presumably added as a correctness guard during development. In production, it turns a transient hardware glitch into a persistent failure requiring manual intervention.
Impact
Any caller passing data derived from optical angles is exposed.
variance_tracker_addinredist/variance.his the primary path; it is called from lighthouse calibration and pose confidence tracking. On hardware that experiences any USB instability, this assert is reachable.Change
One file,
redist/variance.h. The assert is replaced with a pre-check that returns early and logs to stderr. The accumulator (meas->n,meas->sum,meas->sumSq) is left unchanged, which is correct — including a non-finite value in any of those fields would corrupt all subsequent variance calculations derived from this accumulator.Found via
Observed in production: tracker running on embedded hardware (Raspberry Pi, USB bus under load) would enter a crash loop after a cable disturbance.
coredumpctlshowedSIGABRTinsidevariance_measure_add. Deletingconfig.jsonrecovered the system; the underlying assert was the cause.