-
Notifications
You must be signed in to change notification settings - Fork 122
Description
Summary
When a sink accumulates a large number of persisted events in the database (e.g., due to a delivery error), all events are loaded into memory unconditionally on startup/rehydration via put_persisted_messages, bypassing max_memory_mb and setting_max_messages limits. This can cause OOM kills on the pod.
Reproduction scenario
We experienced this in production with an Elasticsearch OpenSearch sink (jobs_sink) that had a strict_dynamic_mapping_exception delivery error. While the sink was erroring:
- 140,416 events accumulated in
sequin_streams.consumer_events(partition 68) - Each enriched Job message was ~18 KB in memory (enrichment joins ~12 related tables)
- When the delivery error was fixed, the SlotMessageStore loaded all persisted messages at once
- Memory jumped to 1,509 MB despite
max_memory_mbbeing set to 512 MB - On an earlier run with
max_memory_mb=128, the store oscillated between filling (hitting the limit for new WAL messages) and draining, but persisted messages were always loaded without limit
Root cause
In lib/sequin/runtime/slot_message_store_state.ex:
# Line 132-143
def put_persisted_messages(%State{} = state, messages) do
persisted_message_groups =
Enum.reduce(messages, state.persisted_message_groups, fn msg, acc ->
Multiset.put(acc, group_id(msg), {msg.commit_lsn, msg.commit_idx})
end)
# This cannot fail because we `skip_limit_check?`
{:ok, state} =
put_messages(%{state | persisted_message_groups: persisted_message_groups}, messages, skip_limit_check?: true)
state
endBoth put_persisted_messages (line 141) and put_table_reader_batch (line 160) call put_messages with skip_limit_check?: true, meaning:
max_memory_mbis not enforcedsetting_max_messages(default 50,000) is not enforced- There is no upper bound on memory consumption during rehydration
Impact
- OOM risk: A pod with 32 GB RAM could be killed if enough events accumulate. In our case, 140k events at ~18 KB each = ~2.5 GB. A table with larger payloads or more accumulated events could easily exceed pod memory limits.
- No recovery path: Once events are persisted, every restart will attempt to load them all, potentially causing repeated OOM crashes.
- Silent accumulation: Events pile up silently when a sink has delivery errors. There's no alert or limit on how many events can accumulate in the database.
Environment
- Sequin self-hosted on Kubernetes (EKS)
- Pod resources: 8 CPU / 32 GB RAM
- 25
ElasticsearchOpenSearch sinks, all using enrichment queries - Source database: Aurora PostgreSQL 12
- Sink consumer config:
batch_size=1,max_memory_mb=128(later increased to 512),message_grouping=true
Suggested improvements
-
Batch-load persisted messages: Instead of loading all persisted messages at once, load them in configurable batches that respect
max_memory_mb, processing each batch before loading the next. -
Enforce a hard memory ceiling: Even for persisted messages, don't exceed a configurable absolute maximum (e.g.,
max_memory_mb * 2or a separatehard_max_memory_mbsetting). -
Add a persisted event count limit or alert: When events accumulate beyond a threshold in the database (e.g., due to sustained delivery errors), surface a health check warning or automatically pause the sink before the backlog becomes dangerous.
-
Consider on-demand loading: Instead of eagerly loading all persisted messages into memory, load them on-demand as the sink has capacity to deliver, similar to how the ConsumerProducer pulls from the store.
Workarounds
Currently the only options to prevent OOM are:
- Fix delivery errors quickly before events accumulate
- Manually delete stuck events from
sequin_streams.consumer_events - Ensure pod memory limits are very generous relative to potential event backlogs