fix(phlower): partition invocations by day, drop instead of delete by webjunkie · Pull Request #26 · PostHog/phlower

webjunkie · 2026-05-02T20:26:59Z

Why

Hourly purge runs DELETE FROM invocations WHERE finished_at < ? against a 26 GB table. The DELETE holds the SQLite write lock long enough that the flush loop can't drain its in-memory buffer. New events keep arriving, buffer grows, each subsequent flush is bigger, takes longer, falls further behind. RSS runs away, liveness probe kills the pod.

PR #25 made this gentler (batched purge with yields between batches), but it didn't fix the root cause — it just moved the threshold. Same death spiral fired again on May 2: mega-flushes of 200K → 678K → 2.55M → 1.47M records, then SIGKILL at 03:15 UTC.

What

Move invocations to daily-partitioned tables (invocations_YYYYMMDD, invocation_details_YYYYMMDD). Retention is now DROP TABLE — a metadata operation. The hourly purge stops contending with the flush loop entirely.

Pre-partition databases get migrated on first boot: ALTER TABLE invocations RENAME TO invocations_legacy. Rename is metadata-only, instant even on the 26 GB prod-us DB. Reads UNION across the legacy table until its data ages past retention, then it gets dropped wholesale.

Also caps the in-memory write-behind buffer (SQLITE_PENDING_BUFFER_CAP, default 200k records) with drop-oldest semantics. Bounded RAM matters more than capturing every record during overload — the alternative is OOM and losing all of them.

Notes

refresh_cached_stats uses MAX(rowid) on the legacy table instead of COUNT(*) — index-fast, accurate enough for healthz during the transition window.
Recovery still chunks per-partition by 4-hour windows so a multi-GB legacy table doesn't hold the read snapshot for minutes.
Today's and tomorrow's partitions get pre-created in the purge loop so midnight-UTC rollover doesn't race with the first flush of the day.
Disk-pressure path simplified: emergency mode just drops more partitions until under the cap. No more fallback DELETE.

Test plan

Smoke test: write across two days, query, purge — verified partition rotation
Migration test: pre-partition DB → renamed to _legacy, reads find both, recovery works (1000 rows replayed)
Docker build + boot: clean fresh-DB and migrated DB scenarios
Deploy to prod-us, monitor RSS over the next purge cycle (~01:00 UTC May 3)
Verify dropped-invocations counter stays at 0 in healthz logs

Hourly purge stalls flushes when retention DELETE has millions of rows to scan. Buffer grows unbounded, RSS climbs, liveness probe kills the pod. Same death spiral fired again on May 2 even with the batched purge from #25 — it just took longer to get there. Mega-flushes of 200K → 678K → 2.55M → 1.47M records, then SIGKILL. Switch invocations + invocation_details to daily-partitioned tables (invocations_YYYYMMDD). Retention enforcement is now DROP TABLE — a metadata operation that takes milliseconds. The hourly purge no longer contends with the flush loop. On first boot against a pre-partition database, the existing tables get renamed to invocations_legacy / invocation_details_legacy. Reads UNION across them; once their data ages past retention, they're dropped wholesale. Rename is metadata-only — instant even on the 26 GB prod-us DB. Also cap the in-memory write-behind buffer (SQLITE_PENDING_BUFFER_CAP, default 200k records). If the flush loop ever falls behind, we drop oldest in chunks and log a warning. Bounded RAM matters more than capturing every record during overload — the alternative is OOM and losing all of them. Recovery still chunks per-partition by 4-hour windows so a multi-GB legacy table doesn't hold the read snapshot for minutes. healthz uses MAX(rowid) on the legacy table instead of COUNT(*) — accurate enough, and gone after retention anyway. Refs: prod-us 1 restart on 2026-05-02 03:15 UTC; second RSS spike at 14:25 UTC self-recovered without restart.

webjunkie · 2026-05-02T20:27:39Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e2cc061e2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Migration: re-check `invocation_details` existence after the column-split path runs. Pre-split databases (the very-old schema with args/kwargs inline on `invocations`) used to get the split- produced details table orphaned because the rename branch keyed off the flag captured before the split. - Backpressure: guard the warning modulus when the buffer cap is small. Tiny caps made `cap // 10` evaluate to 0 and crashed the warning gate with ZeroDivisionError on the first overflow. Use `drop_n` (already clamped to >=1) as the modulus. - Reads: dedupe by task_id across partitions in `list_by_task` and `search`. A task whose lifecycle straddles midnight UTC can have rows in two partitions (RETRY yesterday, SUCCESS today). Rows arrive newest-first, so first-occurrence-wins gives the latest state. Aggregate recovery still slightly inflates counts for these edge cases — bounded by the tiny fraction of tasks crossing midnight, accepted as a follow-up.

webjunkie marked this pull request as ready for review May 2, 2026 20:27

chatgpt-codex-connector Bot reviewed May 2, 2026

View reviewed changes

Comment thread src/phlower/sqlite_store.py Outdated

Comment thread src/phlower/store.py Outdated

Comment thread src/phlower/sqlite_store.py

webjunkie merged commit 799144b into main May 2, 2026
10 checks passed

webjunkie deleted the fix/sqlite-daily-partitions branch May 2, 2026 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(phlower): partition invocations by day, drop instead of delete#26

fix(phlower): partition invocations by day, drop instead of delete#26
webjunkie merged 2 commits into
mainfrom
fix/sqlite-daily-partitions

webjunkie commented May 2, 2026

Uh oh!

webjunkie commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

webjunkie commented May 2, 2026

Uh oh!

webjunkie commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant