feat(embeddings): add AWS Bedrock embedding provider#1929
Conversation
Add `BedrockEmbeddingModel` (engine name `bedrock`) that supports Amazon Titan Text Embeddings v1/v2 and the Cohere Embed v3 family served through AWS Bedrock Runtime. Titan models dispatch one `invoke_model` request per document; Cohere models batch all documents into a single request. Authentication relies on the standard boto3 credential chain. `boto3` is added as an optional dependency, exposed via the new `nemoguardrails[bedrock]` extra, with lazy import so the provider has zero overhead when the extra is not installed. Mock test coverage in `TestBedrockEmbeddingModelMocked` exercises the dispatch logic, lazy `embedding_size` resolution, kwargs forwarding to `boto3.client`, error wrapping, and async/sync encode parity. Live tests under `LIVE_TEST=1` validate end-to-end against a real AWS account. Fixes NVIDIA-NeMo#139 Signed-off-by: Dmitry Voropaev <workerv0ropaev@yandex.ru>
Documentation preview |
📝 WalkthroughWalkthroughThis PR adds AWS Bedrock embedding model support to NeMo Guardrails. It implements a new ChangesBedrock Embeddings Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
docs/configure-rails/other-configurations/embedding-search-providers.md (2)
110-111: 💤 Low valueConsider clarifying whether
region_nameis required.The configuration example above shows
region_name: us-east-1, but this paragraph doesn't specify whether the parameter is required or optional. Sinceboto3can infer the region from the standard credential chain (environment variables, AWS config files), it would be helpful to note thatregion_namecan be omitted if the region is configured in the user's AWS environment.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/configure-rails/other-configurations/embedding-search-providers.md` around lines 110 - 111, Clarify that the region_name parameter in the AWS Bedrock configuration is optional: note that region_name (shown in the example) can be omitted because boto3 will infer the region via its standard credential/ config chain (environment variables, ~/.aws/credentials or config, or IAM role), but if you want to override the inferred region you can set region_name explicitly; reference the region_name parameter and boto3 credential chain and the supported models (e.g., amazon.titan-embed-text-v1, cohere.embed-english-v3) so readers know where to apply this guidance.
87-109: ⚡ Quick winDocument optional configuration parameters for advanced use cases.
The PR summary indicates that Titan v2 supports configurable
dimensions(256/512/1024) and Cohere models support configurableinput_type(defaultsearch_document). Consider adding a note or example showing how users can configure these optional parameters in the YAML, as advanced users may need to customize embedding dimensions or input types for their specific use cases.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/configure-rails/other-configurations/embedding-search-providers.md` around lines 87 - 109, Add documentation showing the optional parameters for advanced embedding configuration under the existing embedding_search_provider/parameters block: document that for embedding_engine: titan (amazon.titan-embed-text-v2) users can set dimensions (256/512/1024) via a dimensions parameter, and for embedding_engine: cohere users can set input_type (default search_document) via an input_type parameter; update the YAML example under both core.embedding_search_provider.parameters and knowledge_base.embedding_search_provider.parameters to include commented or example entries for dimensions and input_type and include short notes explaining valid values and defaults so readers can customize embedding dimensions or input types.tests/test_embeddings_providers_mock.py (1)
851-1151: ⚡ Quick winAdd a regression test for unsupported vendor IDs.
Given current/future vendor dispatch, add one test asserting an unknown vendor (e.g.,
foo.embed-x) raises a clear error before anyinvoke_modelcall. This will lock in fail-fast behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_embeddings_providers_mock.py` around lines 851 - 1151, Add a new regression test method in TestBedrockEmbeddingModelMocked (e.g., test_init_with_unknown_vendor_raises) that uses the existing _fresh_import and a patched boto3 mock client, calls BedrockEmbeddingModel("foo.embed-x") and asserts it raises a clear exception (e.g., ValueError) with a descriptive message about unsupported vendor, and also asserts that mock_client.invoke_model.assert_not_called() to ensure fail-fast behavior before any API calls; reference BedrockEmbeddingModel, _fresh_import, and mock_client.invoke_model in the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nemoguardrails/embeddings/providers/bedrock.py`:
- Around line 143-147: The code currently treats any non-"cohere" vendor as
Titan by calling _encode_titan, which causes wrong payloads for unknown vendors;
update the vendor dispatch in the method that uses self._vendor (the try block
calling _encode_cohere/_encode_titan) to explicitly check for supported vendors
(e.g., if self._vendor == "cohere": return self._encode_cohere(documents) elif
self._vendor == "titan": return self._encode_titan(documents)) and otherwise
raise a clear exception (ValueError or custom) indicating an unsupported vendor
so callers fail fast instead of falling through to Titan.
- Around line 57-60: The BedrockEmbeddingModel.__init__ currently forwards
profile_name in **kwargs to boto3.client which fails; update __init__ to
pop("profile_name", None) from kwargs and, if present, create a
boto3.Session(profile_name=profile_name) and call
session.client("bedrock-runtime", **kwargs) otherwise call
boto3.client("bedrock-runtime", **kwargs) so other aws_* and endpoint_url kwargs
still pass through. Also add validation in encode: before routing to
_encode_titan, check the model family/vendor (used by encode/method calling
_encode_titan) and raise a clear ValueError for unsupported vendors (i.e., allow
"cohere" and "titan" only), so unknown Bedrock model families fail fast with a
helpful message.
In `@tests/test_embeddings_bedrock.py`:
- Around line 22-27: The live Bedrock tests are only gated by LIVE_TEST_MODE but
not by the availability of the provider import; update the skip/guard conditions
in tests/test_embeddings_bedrock.py to require both LIVE_TEST_MODE and that
BedrockEmbeddingModel is not None (i.e., use "if not (LIVE_TEST_MODE and
BedrockEmbeddingModel)" or equivalent) so tests are skipped when
boto3/BedrockEmbeddingModel isn't importable; apply this change to all live-test
guards that currently only check LIVE_TEST_MODE.
---
Nitpick comments:
In `@docs/configure-rails/other-configurations/embedding-search-providers.md`:
- Around line 110-111: Clarify that the region_name parameter in the AWS Bedrock
configuration is optional: note that region_name (shown in the example) can be
omitted because boto3 will infer the region via its standard credential/ config
chain (environment variables, ~/.aws/credentials or config, or IAM role), but if
you want to override the inferred region you can set region_name explicitly;
reference the region_name parameter and boto3 credential chain and the supported
models (e.g., amazon.titan-embed-text-v1, cohere.embed-english-v3) so readers
know where to apply this guidance.
- Around line 87-109: Add documentation showing the optional parameters for
advanced embedding configuration under the existing
embedding_search_provider/parameters block: document that for embedding_engine:
titan (amazon.titan-embed-text-v2) users can set dimensions (256/512/1024) via a
dimensions parameter, and for embedding_engine: cohere users can set input_type
(default search_document) via an input_type parameter; update the YAML example
under both core.embedding_search_provider.parameters and
knowledge_base.embedding_search_provider.parameters to include commented or
example entries for dimensions and input_type and include short notes explaining
valid values and defaults so readers can customize embedding dimensions or input
types.
In `@tests/test_embeddings_providers_mock.py`:
- Around line 851-1151: Add a new regression test method in
TestBedrockEmbeddingModelMocked (e.g., test_init_with_unknown_vendor_raises)
that uses the existing _fresh_import and a patched boto3 mock client, calls
BedrockEmbeddingModel("foo.embed-x") and asserts it raises a clear exception
(e.g., ValueError) with a descriptive message about unsupported vendor, and also
asserts that mock_client.invoke_model.assert_not_called() to ensure fail-fast
behavior before any API calls; reference BedrockEmbeddingModel, _fresh_import,
and mock_client.invoke_model in the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: de29a79a-a278-4db0-be9c-2ac89ff9ea71
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
docs/about/supported-llms.mddocs/configure-rails/other-configurations/embedding-search-providers.mdnemoguardrails/embeddings/providers/__init__.pynemoguardrails/embeddings/providers/bedrock.pypyproject.tomltests/test_configs/with_bedrock_embeddings/config.cotests/test_configs/with_bedrock_embeddings/config.ymltests/test_embeddings_bedrock.pytests/test_embeddings_providers_mock.py
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR adds a new
|
| Filename | Overview |
|---|---|
| nemoguardrails/embeddings/providers/bedrock.py | New BedrockEmbeddingModel provider with Amazon Titan and Cohere vendor dispatch; Cohere path sends all documents in one unbounded request (no 96-text batch limit guard) and dimensions values for Titan v2 are not validated against the allowed set {256, 512, 1024}. |
| nemoguardrails/embeddings/providers/init.py | Adds bedrock module import and registers BedrockEmbeddingModel in the provider registry; change is minimal and consistent with existing pattern. |
| tests/test_embeddings_providers_mock.py | Adds 18 mock-based unit tests for BedrockEmbeddingModel covering known/unknown models, parameter warnings, lazy embedding_size, kwargs forwarding, error wrapping, async parity, and empty-input fast path; all cases are well-structured and consistent with existing provider test classes. |
| tests/test_embeddings_bedrock.py | Live-mode integration tests gated behind LIVE_TEST env var; covers sync/async Titan v2 and Cohere batched encoding; consistent with the existing pattern in other live embedding test files. |
| pyproject.toml | Adds boto3 as an optional dependency under the new bedrock extra and also includes it in the all extra; version constraint >=1.34.0 is reasonable. |
| docs/configure-rails/other-configurations/embedding-search-providers.md | Adds clear installation instructions, full YAML config example for both core and knowledge_base sections, and supported model list; documentation is accurate and consistent with the implementation. |
Sequence Diagram
sequenceDiagram
participant App as NeMo Guardrails
participant Registry as EmbeddingProviderRegistry
participant Bedrock as BedrockEmbeddingModel
participant boto3 as boto3 Client
participant AWS as AWS Bedrock Runtime
App->>Registry: init_embedding_model("amazon.titan-embed-text-v2:0", "bedrock", params)
Registry->>Bedrock: __init__(embedding_model, region_name, ...)
Note over Bedrock: import boto3 (lazy)<br/>validate vendor prefix<br/>create boto3 client
Bedrock-->>App: model instance
App->>Bedrock: encode(documents)
alt "vendor == "amazon""
loop per document
Bedrock->>boto3: "invoke_model(modelId, body={inputText, [dimensions, normalize]})"
boto3->>AWS: POST bedrock-runtime/invoke
AWS-->>boto3: "{embedding: [...]}"
boto3-->>Bedrock: response
end
else "vendor == "cohere""
Bedrock->>boto3: "invoke_model(modelId, body={texts: all_docs, input_type})"
boto3->>AWS: POST bedrock-runtime/invoke
AWS-->>boto3: "{embeddings: [[...], [...]]}"
boto3-->>Bedrock: response
end
Bedrock-->>App: List[List[float]]
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
nemoguardrails/embeddings/providers/bedrock.py:210-219
**Cohere batch size limit not enforced**
Bedrock's Cohere Embed API accepts at most 96 texts per call. `_encode_cohere` forwards the full `documents` list in a single `invoke_model` request with no chunking. A knowledge-base indexing run with more than 96 documents will hit the service limit and raise a `RuntimeError` wrapping a cryptic API error, with no indication that the batch was too large. Adding a loop that slices `documents` into chunks of 96 and concatenates the result would make this method robust for large corpora.
### Issue 2 of 2
nemoguardrails/embeddings/providers/bedrock.py:112-130
**No validation of `dimensions` values for Titan v2**
Titan v2 accepts only three discrete values for `dimensions`: 256, 512, or 1024. Any other integer (e.g. `dimensions=100`) passes the current guard and is silently forwarded to the API, which returns an error that gets wrapped in `RuntimeError("Failed to retrieve embeddings: …")`. Adding an explicit check against `{256, 512, 1024}` before storing `self.dimensions` would surface misconfigurations at construction time with a clear `ValueError`.
Reviews (2): Last reviewed commit: "fix(embeddings): address review feedback..." | Re-trigger Greptile
…NeMo#139) CodeRabbit + Greptile findings on PR NVIDIA-NeMo#1929: - Fail-fast for unsupported vendors: add _SUPPORTED_VENDORS and raise ValueError in __init__ instead of silently routing non-cohere models to the Titan encoder (CodeRabbit Major, Greptile P1). - profile_name now creates a boto3.Session(profile_name=...).client(...) pair instead of being silently dropped on the floor; boto3.client() does not accept profile_name as a kwarg (CodeRabbit Major). - Warn (UserWarning) when dimensions or normalize are passed for models that do not support them (Titan v1, any Cohere) so misconfiguration is visible instead of silently ignored (Greptile P2). - Guard live tests with BEDROCK_AVAILABLE so LIVE_TEST=1 without boto3 skips cleanly instead of failing with TypeError (CodeRabbit Minor). - Docs: clarify region_name is optional and document profile_name, dimensions, normalize, input_type as inline-commented examples. - Tests: add coverage for the new fail-fast, profile/Session, and three warn paths. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Dmitry Voropaev <workerv0ropaev@yandex.ru>
|
Pushed cfbfc3cbf addressing the bot reviews. Per-thread replies are above; high-level summary for convenience: Major / actionable
Nitpicks (CodeRabbit review body)
Net changes: 4 files, +155/−14. Mock-test count rose from 18 to 24 (full |
Description
Adds a new embedding provider that integrates AWS Bedrock with NeMo Guardrails. The provider is registered under the
bedrockengine name and supports two vendor families served through Amazon Bedrock Runtime:amazon.titan-embed-text-v1(1536-dim) andamazon.titan-embed-text-v2:0(1024-dim, configurable 256/512/1024 viadimensions). Titan's API accepts one input per request, so multi-document calls are dispatched as a sequence ofinvoke_modelrequests.cohere.embed-english-v3,cohere.embed-multilingual-v3,cohere.embed-english-light-v3,cohere.embed-multilingual-light-v3. Cohere on Bedrock supports batching, so all documents go in a single request with the configurableinput_type(defaultsearch_document).Vendor dispatch is decided in
__init__by splitting the model id on".", so adding a new family later is a small change (a new_encode_<vendor>method).Example config
Dependency
boto3is added as an optional dependency, exposed via the new extra:Lazy
import boto3inside__init__keeps overhead at zero when the extra is not installed, and a clearImportErrordirects the user to the extra. Authentication uses the standardboto3credential chain (env vars,~/.aws/credentials, IAM role) — no eager validation, matching the style of the other providers.Tests
TestBedrockEmbeddingModelMocked, 18 cases) cover: known/unknown models for both vendors,dimensions/normalize/input_typeparameters, lazyembedding_sizeinitialization (no AWS call until.embedding_sizeis read), kwargs forwarding toboto3.client, error wrapping inRuntimeError, async/sync encode parity, and the empty-input fast path. All 18 pass locally. No AWS calls are made andboto3does not need to be installed for these tests.tests/test_embeddings_bedrock.py, 4 cases) follow the existingLIVE_TEST=1gating used bytest_embeddings_cohere.py/test_embeddings_google.py. They cover sync, async, and a Cohere-on-Bedrock batched call.Docs
bedrockrow to the embedding model providers table indocs/about/supported-llms.md.docs/configure-rails/other-configurations/embedding-search-providers.md.Related Issue(s)
Fixes #139
Checklist
cc @Pouyanpi
Summary by CodeRabbit
New Features
Documentation
Chores
boto3dependency for Bedrock support via thebedrockextra.Tests