Explore faster local JSONL scanning for cost history

Related to #1013 and #1014, but intentionally separate from that correctness fix.

While debugging the long `turn_context` attribution bug, one broader question came up: should CodexBar's local cost-history scanner eventually have a faster path for large JSONL histories?

The current bug is not primarily a raw throughput issue. In #1013, the failure was that the scanner dropped model state from a long Codex `turn_context` row and later fell back to `gpt-5` for model-less `token_count` rows. #1014 keeps that fix small and in Swift: retain truncated prefixes, recover only the Codex turn-context model from that prefix, keep normal JSON parsing away from partial rows, and invalidate the Codex cost cache.

This issue is only for the follow-up performance/design question:

- A Rust/SIMD helper could plausibly make sense if local histories grow large enough. Libraries such as `simdjson`, Rust `simd-json`, or byte-search primitives like `memchr`/`memmem` are a good fit for NDJSON-style scanning and selective parsing.
- A pure Swift optimization may be enough first: mmap-style reads, fewer allocations, SIMD-friendly byte searches, and selective JSON parsing only for rows containing relevant markers.
- Metal/GPU is probably a much weaker fit for this specific workload unless profiling proves otherwise. JSONL scanning has irregular control flow, string escaping, row boundaries, and I/O/memory bandwidth constraints. GPU JSON parsers usually need a structural-index pipeline, which would add a lot of complexity to a menu-bar app.

Open questions for maintainers:

- Is scanner throughput currently a real user-facing issue, or should this remain speculative until there is profiling data?
- If it becomes a problem, would you prefer keeping the scanner pure Swift, or would an optional/native helper in Rust be acceptable for this project?
- Should we add a small benchmark fixture around Codex/Claude JSONL scanning before considering any non-Swift implementation?

My bias would be: land the correctness fix in #1014 first, keep this issue as a design/profiling placeholder, and only consider Rust/SIMD after a benchmark shows the Swift scanner is actually the bottleneck.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore faster local JSONL scanning for cost history #1016

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Explore faster local JSONL scanning for cost history #1016

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions