-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
docs(self-hosted): introduce troubleshooting snuba page #17782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+93
−0
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
d936b2c
docs(self-hosted): introduce troubleshooting snuba page
aldy505 47e8493
docs(self-hosted): cleanup text
aldy505 bde3bb0
Merge branch 'master' into aldy505/self-hosted/troubleshooting-snuba
aldy505 7e2da12
Merge branch 'master' into aldy505/self-hosted/troubleshooting-snuba
sfanahata 8ac0c91
Merge branch 'master' into aldy505/self-hosted/troubleshooting-snuba
sfanahata 93d30ea
fix: sidebar_order conflict, grammar, and formatting
c3e6e07
fix: add docker compose exec prefix to Kafka CLI commands
528e872
fix: add missing --all-groups flag to kafka-consumer-groups --describe
ca5b35d
fix: correct consumer group name in kafka-consumer-groups command
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,93 @@ | ||
| --- | ||
| title: Troubleshooting Snuba | ||
| sidebar_title: Snuba | ||
| sidebar_order: 6 | ||
| --- | ||
|
|
||
| Snuba is the service that handles Sentry's search and analytics. It's separated into two: a consumer that ingests data from Kafka into ClickHouse, and a querier that queries ClickHouse and returns the results to Sentry. | ||
|
|
||
| ## What Snuba subscription consumers are responsible for | ||
|
|
||
| Snuba subscriptions implement Sentry's alert rules, meaning periodic queries that run on schedule and emit results back to Sentry. There are two roles per dataset (events, transactions, metrics, generic-metrics, eap-items): | ||
|
|
||
| - **Subscription scheduler** — decides *when* to run each subscribed query. It does **not** consume event data. Instead, it tails a small "clock" topic (the commit log) and emits one Kafka message per scheduled query. | ||
| - **Subscription executor** — picks up those scheduled queries, runs them against ClickHouse, and produces the answers to a results topic that Sentry consumes. | ||
|
|
||
| There is also a combined `subscriptions-scheduler-executor` binary that fuses both stages in one process (this is what self-hosted typically runs for `events` / `transactions` / `metrics`). | ||
|
|
||
| ### Which topics they consume from — and why this matters | ||
|
|
||
| | Role | Reads from | Writes to | Where defined | | ||
| |---|---|---|---| | ||
| | Scheduler | `snuba-commit-log` (events) / `snuba-transactions-commit-log` / `snuba-metrics-commit-log` / `snuba-generic-metrics-*-commit-log` | `scheduled-subscriptions-<entity>` | `commit_log_topic` + `subscription_scheduled_topic` in each storage YAML | | ||
| | Executor | `scheduled-subscriptions-<entity>` | `<entity>-subscription-results` | `subscription_scheduled_topic` + `subscription_result_topic` | | ||
|
|
||
| Critically: the scheduler does not consume from `events` / `transactions` / `snuba-metrics`. It consumes the *commit log* of those topics. The commit log is written by the main ingest consumer once per commit (i.e. periodically batched), so it has dramatically lower throughput than the data topic itself. | ||
|
|
||
| The scheduler reads the `orig_message_ts` header from each commit-log message and uses that as a clock to decide which subscriptions are due. See `subscriptions_scheduler.py` for the design — tick consumer → tick buffer → commit-strategy step → query producer. | ||
|
|
||
| ### Why offsets may appear static or end offsets missing | ||
|
|
||
| This is **expected** in most healthy self-hosted deployments. Reasons, in order of likelihood: | ||
|
|
||
| 1. **Low traffic on the source topic.** The scheduler reads the *commit log*, | ||
| which only gets a record when the upstream events/transactions consumer | ||
| flushes a batch (typically every few seconds, and only if there's data). | ||
| On a quiet self-hosted instance, that's a tiny trickle. | ||
| 2. **No active subscriptions.** Sentry alert rules are what create | ||
| subscription rows; if no alerts are configured for a dataset, the | ||
| scheduler emits nothing to `scheduled-subscriptions-*`, so the executor's | ||
| input topic stays empty and its committed offset never moves. | ||
| End offset = last committed offset = no apparent change. | ||
| 3. **GLOBAL watermark mode buffers ticks.** For most entities (transactions, | ||
| metrics, eap-items, generic-metrics) `subscription_scheduler_mode: global`. | ||
| The scheduler waits until *every* partition of the source topic has | ||
| advanced past a timestamp before scheduling — so on a single-partition | ||
| self-hosted topic it advances the moment a commit-log message arrives, but | ||
| on a multi-partition setup with one stalled partition it can stall | ||
| scheduling without that being a bug. | ||
| 4. **kafka-ui showing "no end offset."** kafka-ui sometimes can't render an | ||
| end offset for a topic with zero recent produces; this is a UI artifact, | ||
| not a consumer problem. `docker compose exec kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --all-groups` will show | ||
| the real `LOG-END-OFFSET`. | ||
| 5. **At-least-once commit policy.** The scheduler only commits the *earliest* | ||
| commit-log offset whose tick has been fully scheduled and produced. It | ||
| deliberately holds the committed offset back relative to the read | ||
| position; small static lag is normal. | ||
|
|
||
| ### What "healthy" looks like | ||
|
|
||
| - **Scheduler:** committed offset on `snuba-*-commit-log` advances roughly in | ||
| lockstep with whatever the upstream data consumer commits (so: as fast as | ||
| events arrive, batched). Lag should stay in the low single-digit-seconds | ||
| range. Zero movement is fine **if** the source dataset isn't currently | ||
| ingesting. | ||
| - **Executor:** committed offset on `scheduled-subscriptions-<entity>` | ||
| advances every time an alert fires. With *no* configured Sentry alerts for | ||
| that dataset, expect the offset to never move — that's not a bug, there's | ||
| literally no work. | ||
| - **Result topic:** `<entity>-subscription-results` should receive one | ||
| message per executed query; Sentry's own consumer reads these. If alerts | ||
| in Sentry's UI are firing correctly, this loop is healthy regardless of | ||
| how kafka-ui renders the numbers. | ||
|
|
||
| Quick sanity checks from the host: | ||
|
|
||
| - `docker compose exec kafka kafka-console-consumer --bootstrap-server kafka:9092 --topic scheduled-subscriptions-events --from-beginning --max-messages 5` | ||
| — if you see scheduled queries, the scheduler is producing. | ||
| - `docker compose exec kafka kafka-console-consumer --bootstrap-server kafka:9092 --topic events-subscription-results --from-beginning --max-messages 5` | ||
| — if you see results, the executor is running and Sentry is being fed. | ||
| - `docker compose exec kafka kafka-consumer-groups --bootstrap-server kafka:9092 --describe --group snuba-events-subscriptions-consumers` | ||
| — authoritative `LAG` view. | ||
|
|
||
| ### Differences vs. the events / regular Snuba consumers | ||
|
|
||
| | | Ingest consumer (e.g. errors, transactions) | Subscription scheduler | Subscription executor | | ||
| |---|---|---|---| | ||
| | Source topic | High-volume data topic (`events`, `transactions`, `snuba-items`…) | Low-volume **commit log** of that data topic | Internal `scheduled-subscriptions-*` topic | | ||
| | Driven by | Event throughput from Sentry/Relay | Commit cadence of the ingest consumer + configured alert rules | Existence of configured alert rules | | ||
| | What it writes | ClickHouse INSERTs + commit-log entry | Scheduled-query Kafka messages | Query results to Kafka + ClickHouse SELECTs | | ||
| | Expected lag pattern | Continuous offset advance under normal load | Periodic, batched advance — long flat stretches between commits are normal | Static if no alerts are configured; otherwise advances per alert tick | | ||
| | If offset is static | Likely a problem (ingestion stalled) | Probably fine (no upstream commits or no alerts) | Probably fine (no Sentry alert rules for this dataset) | | ||
|
|
||
| The headline point: **a static offset on a subscription consumer is only a problem if the corresponding feature in Sentry is broken.** Verify by checking whether (a) ingestion is working and (b) any alert rules exist for the dataset — not by staring at offsets in isolation. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.