Hi! Two small bugs in the Claude Code session-end ingest path that affect what PostHog sees on $ai_generation events. Both reproduce on the current main (a004aa2). Happy to PR if useful.
1. Cache token properties missing the $ai_ prefix
posthog_llma/events.py → build_ai_generation emits:
properties = {
...
"cache_read_input_tokens": cache_read_tokens,
"cache_creation_input_tokens": cache_creation_tokens,
}
PostHog's LLM Analytics cost pipeline only reads the $ai_* namespace (docs call out $ai_cache_read_input_tokens / $ai_cache_creation_input_tokens), so cache reads/writes are dropped on the floor and $ai_total_cost_usd is computed from input+output only.
For Anthropic prompt-cached workloads the cache buckets dominate the bill, so reported cost can be ~25x lower than the actual spend. Concrete example from a real session: cache_read=150,109, cache_write=75,729, input=4, output=926 — PostHog showed ~$0.014, actual cost ~$0.343.
2. Extended thinking is emitted as {"type":"text"} instead of {"type":"thinking"}
posthog_llma/parser.py → _finalize_generation concatenates thinking and text blocks into a single output_text string, and event_builder.py → build_events wraps that as one text block:
content_blocks = []
if gen["output_text"]:
content_blocks.append({"type": "text", "text": gen["output_text"]})
So when a Claude Code generation has both thinking and text blocks (common with extended thinking enabled), thinking content appears as plain assistant text in the PostHog UI instead of a dedicated thinking block — losing the visual distinction and breaking property queries that filter on block type. The streaming-merge work from #86 already tracks blocks by type in state["blocks_by_type"], so the typing info is right there in _finalize_generation, just collapsed before emission.
Thanks!
Hi! Two small bugs in the Claude Code session-end ingest path that affect what PostHog sees on
$ai_generationevents. Both reproduce on the currentmain(a004aa2). Happy to PR if useful.1. Cache token properties missing the
$ai_prefixposthog_llma/events.py→build_ai_generationemits:PostHog's LLM Analytics cost pipeline only reads the
$ai_*namespace (docs call out$ai_cache_read_input_tokens/$ai_cache_creation_input_tokens), so cache reads/writes are dropped on the floor and$ai_total_cost_usdis computed from input+output only.For Anthropic prompt-cached workloads the cache buckets dominate the bill, so reported cost can be ~25x lower than the actual spend. Concrete example from a real session: cache_read=150,109, cache_write=75,729, input=4, output=926 — PostHog showed ~$0.014, actual cost ~$0.343.
2. Extended thinking is emitted as
{"type":"text"}instead of{"type":"thinking"}posthog_llma/parser.py→_finalize_generationconcatenates thinking and text blocks into a singleoutput_textstring, andevent_builder.py→build_eventswraps that as one text block:So when a Claude Code generation has both
thinkingandtextblocks (common with extended thinking enabled), thinking content appears as plain assistant text in the PostHog UI instead of a dedicated thinking block — losing the visual distinction and breaking property queries that filter on block type. The streaming-merge work from #86 already tracks blocks by type instate["blocks_by_type"], so the typing info is right there in_finalize_generation, just collapsed before emission.Thanks!