-
Notifications
You must be signed in to change notification settings - Fork 1
Prompt cache cost estimation #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements prompt cache simulation to estimate cost savings when using prompt caching across multi-step agent interactions. The implementation introduces a token cache tracking system that models how prompts would be cached and reused across conversation steps.
Key changes:
- New
TokenCacheclass to track cached tokens across conversation steps with a growing prefix model simulateCacheSavingsfunction that estimates costs when prompt caching is enabled- Updated pricing module to support cache creation and cache read costs separately
- Enhanced HTML report to display estimated cache savings
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/token-cache.ts | New class implementing token cache tracking with cost calculation |
| lib/utils.ts | Added cache simulation logic and moved buildAgentPrompt from test-discovery |
| lib/utils.test.ts | Comprehensive test coverage for TokenCache and simulateCacheSavings |
| lib/pricing.ts | Refactored to use inline types and added cacheCreationInputTokenCost support |
| lib/pricing.test.ts | Updated tests with proper type usage |
| lib/report.ts | Added cacheSimulation field to metadata |
| lib/report-template.ts | Enhanced UI to display cache simulation results |
| index.ts | Integrated cache simulation into main flow with console output |
| results/result-2025-12-12-23-02-22-anthropic-claude-haiku-4.5.json | Example result file with cache simulation data |
| AGENTS.md | Updated test command from test:self to bun test |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When I started the PR I first thought we'd allow the user to select whether to enable prompt caching or not for a run but there is no central toggle for it in AI SDK so it's sort of a PITA to support it for all providers. Since prompt cache "logic" is fairly basic (system prompt + previous prompts are cached) we can quite easily make a "good enough" estimation that would support all providers where AI SDK has the cache insertion and cached token output costs.