feat(ci): add Bedrock integration tests with record/replay by skamenan7 · Pull Request #4292 · ogx-ai/ogx

skamenan7 · 2025-12-03T19:08:02Z

What does this PR do?

Adds Bedrock integration tests to CI using a record/replay mechanism. Tests run against pre-recorded API responses, without the need for AWS credentials in CI.

The main challenge was that Bedrock's OpenAI-compatible API doesn't support everything - no tool calling, no embeddings, no dynamic model listing. So instead of running the full base suite (which would fail on ~40 tests), I created a dedicated bedrock suite with just the tests that actually work.

Changes:

New run-bedrock.yaml stack config with dummy API key for replay mode
bedrock suite in suites.py pointing to 3 test functions (6 parametrized tests total)
Config resolution fix for distro::file.yaml format in library mode
Added a docs/source/providers/inference/bedrock_recording_guide.md so contributors with AWS access can re-record tests when needed

Closes #4095

Test Plan

Run from tests/integration/inference:


  uv run pytest -v \
    test_openai_completion.py::test_openai_chat_completion_non_streaming \
    test_openai_completion.py::test_openai_chat_completion_streaming \
    test_openai_completion.py::test_inference_store \
    --setup=bedrock \
    --stack-config=ci-tests::run-bedrock.yaml \
    --inference-mode=replay \
    -k "client_with_models"

Expected: 6 passed

test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=bedrock/openai.gpt-oss-20b-1:0-inference:chat_com
 pletion:non_streaming_01] PASSED [ 16%]
 test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=bedrock/openai.gpt-oss-20b-1:0-inference:chat_complet
 ion:streaming_01] PASSED [ 33%]
 test_openai_completion.py::test_inference_store[client_with_models-txt=bedrock/openai.gpt-oss-20b-1:0-True] PASSED                          [
  50%]
 test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=bedrock/openai.gpt-oss-20b-1:0-inference:chat_com
 pletion:non_streaming_02] PASSED [ 66%]
 test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=bedrock/openai.gpt-oss-20b-1:0-inference:chat_complet
 ion:streaming_02] PASSED [ 83%]
 test_openai_completion.py::test_inference_store[client_with_models-txt=bedrock/openai.gpt-oss-20b-1:0-False] PASSED
 [100%]

mergify · 2025-12-10T14:14:18Z

This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

cdoern

did a sweep on this one, lgtm. I think this would be a valuable addition

- Create dedicated bedrock suite with 3 compatible test functions - Add run-bedrock.yaml stack config for CI - Enable config resolution for distro::file.yaml format in library mode - Add test recordings for streaming, non-streaming, and inference store - Skip tool-calling tests for Bedrock (not supported by AWS) - Add recording guide documentation for contributors Tested with GPT-OSS model on us-west-2 region.

The bedrock suite uses specific test function paths like "test_file.py::test_function" in its roots. The pytest_ignore_collect hook was treating these as filesystem paths, causing 0 tests to be collected. Changes: - Strip "::test_function" suffix when checking file paths - Add pytest_collection_modifyitems to filter to specific tests Without this fix, cleanup_recordings.py marks bedrock recordings as unused and deletes them in CI.

- Remove separate run-bedrock.yaml in favor of modifying templates - Update bedrock provider config: dummy API key for replay mode, us-west-2 region - Pre-register bedrock model in ci-tests template (Bedrock /v1/models returns empty) - Update ci_matrix.json to use default stack config

Consolidated config file usage (ci-tests::run.yaml instead of run-bedrock.yaml), added Stack Configuration Choices section, fixed default region to us-west-2, and updated pytest commands to use full paths with clarification about test count.

Changed default region from us-east-2 to us-west-2 to reflect where GPT-OSS model is available. Update test expectation to match.

- Replace non-existent run-bedrock.yaml with config.yaml in recording guide - Replace non-existent run.yaml with config.yaml in record-replay.mdx - Fix test count from "4 parametrized test cases" to "6 parametrized tests" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove bedrock_recording_guide.md since the content already exists in docs/docs/contributing/testing/record-replay.mdx (Provider-Specific: AWS Bedrock section). Note: The replay-mode-dummy-key default is intentional and required for CI replay tests to work - the OpenAI client requires an API key to be instantiated even though actual HTTP calls are intercepted by the recording/replay mechanism.

- Remove replay-mode-dummy-key default from bedrock config - Set AWS_BEARER_TOKEN_BEDROCK env var in CI workflow instead - Remove reference to non-existent bedrock-test.yaml in docs - Improve comment explaining Bedrock model pre-registration requirement - Regenerate distribution configs and provider docs

- Exclude stream_options from request hash in api_recorder to fix Bedrock replay - Fix mixed test specifier handling in conftest.py (allow mixing file paths and ::test_name)

The previous change excluded stream_options from request hashes for all providers, breaking existing recordings for GPT/OpenAI tests. Now only excludes stream_options for Bedrock URLs where the field varies between runs.

skamenan7 · 2026-01-21T15:12:20Z

Hi @leseb and @derekhiggins, CI is green. And all requested changes addressed in a11796f. All threads resolved—could you take another look when you have a moment? PTAL. Thanks.

skamenan7 · 2026-01-21T18:10:25Z

CI is green except for one flaky test (docker, gpt, responses) that's timing out on file search tests. This is unrelated to the Bedrock changes - same test also fails intermittently on main branch (I checked run 21192035693).

All Bedrock-specific tests pass across library, server, and docker modes. Thanks.

leseb

My final request is to revert to us-east-2, thanks

leseb · 2026-01-21T19:27:26Z

 class BedrockConfig(RemoteInferenceProviderConfig):
    region_name: str = Field(
-        default_factory=lambda: os.getenv("AWS_DEFAULT_REGION", "us-east-2"),
+        default_factory=lambda: os.getenv("AWS_DEFAULT_REGION", "us-west-2"),


Can we only do this in ci-test then?

CI sets AWS_DEFAULT_REGION=us-west-2 for GPT-OSS test model. The region change shouldn't have been in the general config.

skamenan7 · 2026-01-21T20:12:11Z

#4292 (comment)

Hi @leseb, I reverted the default region to us-east-2 to match main. CI sets AWS_DEFAULT_REGION=us-west-2 via env var for the GPT-OSS test model. Thanks.

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 3, 2025

skamenan7 force-pushed the feat/bedrock-ci-4095 branch 7 times, most recently from 305be8e to 0667de8 Compare December 3, 2025 22:46

derekhiggins reviewed Dec 4, 2025

View reviewed changes

skamenan7 force-pushed the feat/bedrock-ci-4095 branch 3 times, most recently from 69eb2c4 to 51121ae Compare December 4, 2025 20:13

skamenan7 marked this pull request as ready for review December 4, 2025 21:36

skamenan7 requested review from ashwinb, bbrowning, cdoern, ehhuang, franciscojavierarceo, leseb, mattf and raghotham as code owners December 4, 2025 21:36

skamenan7 requested a review from derekhiggins December 4, 2025 22:07

skamenan7 force-pushed the feat/bedrock-ci-4095 branch 3 times, most recently from bd9f2dd to d75be15 Compare December 9, 2025 14:35

mergify Bot added the needs-rebase label Dec 10, 2025

skamenan7 force-pushed the feat/bedrock-ci-4095 branch 2 times, most recently from 2d82f01 to 5067cee Compare December 10, 2025 14:16

mergify Bot removed the needs-rebase label Dec 10, 2025

skamenan7 force-pushed the feat/bedrock-ci-4095 branch 5 times, most recently from a5be297 to a92986b Compare January 14, 2026 21:51

cdoern approved these changes Jan 15, 2026

View reviewed changes

skamenan7 force-pushed the feat/bedrock-ci-4095 branch 2 times, most recently from 5c1a38e to 6c58822 Compare January 15, 2026 15:09

derekhiggins reviewed Jan 16, 2026

View reviewed changes

Comment thread docs/docs/contributing/testing/record-replay.mdx Outdated

Comment thread src/llama_stack/distributions/ci-tests/config.yaml

skamenan7 and others added 10 commits January 21, 2026 09:01

test: update bedrock config test to match new defaults

f89bef9

test: update bedrock config test to match new defaults

b98a1b2

Changed default region from us-east-2 to us-west-2 to reflect where GPT-OSS model is available. Update test expectation to match.

fix(ci): address PR comments on bedrock integration

506ca8b

- Exclude stream_options from request hash in api_recorder to fix Bedrock replay - Fix mixed test specifier handling in conftest.py (allow mixing file paths and ::test_name)

skamenan7 force-pushed the feat/bedrock-ci-4095 branch from 6c58822 to 506ca8b Compare January 21, 2026 14:04

fix(ci): make stream_options exclusion bedrock-specific

a11796f

The previous change excluded stream_options from request hashes for all providers, breaking existing recordings for GPT/OpenAI tests. Now only excludes stream_options for Bedrock URLs where the field varies between runs.

skamenan7 requested a review from derekhiggins January 21, 2026 18:31

leseb requested changes Jan 21, 2026

View reviewed changes

fix(bedrock): revert default region to us-east-2 to match main

7e776d7

CI sets AWS_DEFAULT_REGION=us-west-2 for GPT-OSS test model. The region change shouldn't have been in the general config.

skamenan7 requested a review from leseb January 21, 2026 20:12

leseb approved these changes Jan 21, 2026

View reviewed changes

leseb merged commit 393e7db into ogx-ai:main Jan 21, 2026
68 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): add Bedrock integration tests with record/replay#4292

feat(ci): add Bedrock integration tests with record/replay#4292
leseb merged 12 commits intoogx-ai:mainfrom
skamenan7:feat/bedrock-ci-4095

skamenan7 commented Dec 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Dec 10, 2025

Uh oh!

cdoern left a comment

Uh oh!

Uh oh!

Uh oh!

skamenan7 commented Jan 21, 2026 •

edited

Loading

Uh oh!

skamenan7 commented Jan 21, 2026 •

edited

Loading

Uh oh!

leseb left a comment

Uh oh!

leseb Jan 21, 2026

Uh oh!

skamenan7 commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

skamenan7 commented Dec 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Dec 10, 2025

Uh oh!

cdoern left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

skamenan7 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skamenan7 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

leseb Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

skamenan7 commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

skamenan7 commented Jan 21, 2026 •

edited

Loading

skamenan7 commented Jan 21, 2026 •

edited

Loading