feat(ci): add Bedrock integration tests with record/replay#4292
feat(ci): add Bedrock integration tests with record/replay#4292leseb merged 12 commits intoogx-ai:mainfrom
Conversation
305be8e to
0667de8
Compare
69eb2c4 to
51121ae
Compare
bd9f2dd to
d75be15
Compare
|
This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
2d82f01 to
5067cee
Compare
a5be297 to
a92986b
Compare
cdoern
left a comment
There was a problem hiding this comment.
did a sweep on this one, lgtm. I think this would be a valuable addition
5c1a38e to
6c58822
Compare
- Create dedicated bedrock suite with 3 compatible test functions - Add run-bedrock.yaml stack config for CI - Enable config resolution for distro::file.yaml format in library mode - Add test recordings for streaming, non-streaming, and inference store - Skip tool-calling tests for Bedrock (not supported by AWS) - Add recording guide documentation for contributors Tested with GPT-OSS model on us-west-2 region.
The bedrock suite uses specific test function paths like "test_file.py::test_function" in its roots. The pytest_ignore_collect hook was treating these as filesystem paths, causing 0 tests to be collected. Changes: - Strip "::test_function" suffix when checking file paths - Add pytest_collection_modifyitems to filter to specific tests Without this fix, cleanup_recordings.py marks bedrock recordings as unused and deletes them in CI.
- Remove separate run-bedrock.yaml in favor of modifying templates - Update bedrock provider config: dummy API key for replay mode, us-west-2 region - Pre-register bedrock model in ci-tests template (Bedrock /v1/models returns empty) - Update ci_matrix.json to use default stack config
Consolidated config file usage (ci-tests::run.yaml instead of run-bedrock.yaml), added Stack Configuration Choices section, fixed default region to us-west-2, and updated pytest commands to use full paths with clarification about test count.
Changed default region from us-east-2 to us-west-2 to reflect where GPT-OSS model is available. Update test expectation to match.
- Replace non-existent run-bedrock.yaml with config.yaml in recording guide - Replace non-existent run.yaml with config.yaml in record-replay.mdx - Fix test count from "4 parametrized test cases" to "6 parametrized tests" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove bedrock_recording_guide.md since the content already exists in docs/docs/contributing/testing/record-replay.mdx (Provider-Specific: AWS Bedrock section). Note: The replay-mode-dummy-key default is intentional and required for CI replay tests to work - the OpenAI client requires an API key to be instantiated even though actual HTTP calls are intercepted by the recording/replay mechanism.
- Remove replay-mode-dummy-key default from bedrock config - Set AWS_BEARER_TOKEN_BEDROCK env var in CI workflow instead - Remove reference to non-existent bedrock-test.yaml in docs - Improve comment explaining Bedrock model pre-registration requirement - Regenerate distribution configs and provider docs
- Exclude stream_options from request hash in api_recorder to fix Bedrock replay - Fix mixed test specifier handling in conftest.py (allow mixing file paths and ::test_name)
6c58822 to
506ca8b
Compare
The previous change excluded stream_options from request hashes for all providers, breaking existing recordings for GPT/OpenAI tests. Now only excludes stream_options for Bedrock URLs where the field varies between runs.
|
Hi @leseb and @derekhiggins, CI is green. And all requested changes addressed in a11796f. All threads resolved—could you take another look when you have a moment? PTAL. Thanks. |
|
CI is green except for one flaky test (docker, gpt, responses) that's timing out on file search tests. This is unrelated to the Bedrock changes - same test also fails intermittently on main branch (I checked run 21192035693). All Bedrock-specific tests pass across library, server, and docker modes. Thanks. |
leseb
left a comment
There was a problem hiding this comment.
My final request is to revert to us-east-2, thanks
| class BedrockConfig(RemoteInferenceProviderConfig): | ||
| region_name: str = Field( | ||
| default_factory=lambda: os.getenv("AWS_DEFAULT_REGION", "us-east-2"), | ||
| default_factory=lambda: os.getenv("AWS_DEFAULT_REGION", "us-west-2"), |
There was a problem hiding this comment.
Can we only do this in ci-test then?
CI sets AWS_DEFAULT_REGION=us-west-2 for GPT-OSS test model. The region change shouldn't have been in the general config.
|
Hi @leseb, I reverted the default region to us-east-2 to match main. CI sets AWS_DEFAULT_REGION=us-west-2 via env var for the GPT-OSS test model. Thanks. |
What does this PR do?
Adds Bedrock integration tests to CI using a record/replay mechanism. Tests run against pre-recorded API responses, without the need for AWS credentials in CI.
The main challenge was that Bedrock's OpenAI-compatible API doesn't support everything - no tool calling, no embeddings, no dynamic model listing. So instead of running the full base suite (which would fail on ~40 tests), I created a dedicated bedrock suite with just the tests that actually work.
Changes:
docs/source/providers/inference/bedrock_recording_guide.mdso contributors with AWS access can re-record tests when neededCloses #4095
Test Plan
Run from tests/integration/inference:
Expected: 6 passed