Simplify attachment validation and canonicalization by adambalogh · Pull Request #97 · OpenGradient/tee-gateway

adambalogh · 2026-06-08T17:17:19Z

Summary

Removes the global attachment size cap (MAX_ATTACHMENT_BYTES) and simplifies the request canonicalization logic for signed requests. Attachment validation now focuses solely on per-model modality support, while the canonicalization strategy shifts from hashing base64 bytes to simply dropping them entirely.

Key Changes

Removed global size cap: Deleted MAX_ATTACHMENT_BYTES (30 MB) constant and all size-checking logic from validate_attachments(). The enclave's encrypted transport layer handles payload size constraints independently.
Simplified attachment validation: validate_attachments() now only rejects attachments when a model explicitly declares unsupported modalities (image_inputs: False or pdf_inputs: False). Removed the status parameter from AttachmentValidationError — all validation errors now return HTTP 400.
Refactored canonicalization strategy: Moved _canonical_user_content() from chat_controller.py to llm_backend.py as canonical_user_content(). Changed the approach:
- Old: Hashed base64 attachment bytes with SHA256 to commit to exact content
- New: Drops attachment bytes entirely, keeping only type and filename
- Rationale: The attachment bytes still travel inside the encrypted request payload; the signed hash only needs to commit to which files were sent, not their exact content
Removed helper functions: Deleted _decoded_base64_len() (no longer needed for size calculations) and simplified AttachmentValidationError docstring.
Updated imports: chat_controller.py now imports canonical_user_content from llm_backend instead of defining it locally; removed unused hashlib import.

Implementation Details

The canonicalization function handles multimodal content (list of parts) by:
- Preserving text parts verbatim
- For attachments: extracting type and filename from either part["file"]["filename"] or part["filename"], omitting the base64 bytes entirely
- Returning plain strings unchanged
Tests updated to reflect the new behavior: test_attachment_keeps_filename_drops_bytes replaces the old digest-based test, and the size-cap test is removed entirely.
The change maintains the "fail open" principle: models without capability profiles are never wrongly blocked, allowing the provider to handle unsupported combinations.

https://claude.ai/code/session_013cbCKjFXib5LbSv9Uu7WUq

The attachment work reached into llm_backend's private _convert_content_part from the chat controller, and the tests pulled the controller's private _canonical_user_content. Move the request-hashing canonicalizer into llm_backend (next to the converter it depends on) and expose it as the public canonical_user_content. The controller and tests now import only public names, and _convert_content_part stays internal to llm_backend. No behavior change. https://claude.ai/code/session_013cbCKjFXib5LbSv9Uu7WUq

Drop the sha256 digesting of inline attachment content from the request canonicalization. The signed request now commits to text verbatim and each attachment's type + filename only; the bytes are left out entirely (they still travel inside the encrypted transport). This removes the base64 walking and hashing machinery and lets canonical_user_content stop depending on the content converter. https://claude.ai/code/session_013cbCKjFXib5LbSv9Uu7WUq

The 30 MB attachment cap overlapped the 16 MB cap already enforced on the encrypted request body in ohttp_controller, which always fires first. Remove MAX_ATTACHMENT_BYTES, _decoded_base64_len and the byte-summing loop so validate_attachments is just modality gating (reject images/docs a model can't handle, fail open when capabilities are unknown). With the 413 path gone, AttachmentValidationError no longer needs a custom status, and the validator returns early when the model accepts every modality. https://claude.ai/code/session_013cbCKjFXib5LbSv9Uu7WUq

* docs: design for native LLM attachments over the private OHTTP path * docs: confirm pinned langchain versions support native attachments (no PCR change) * feat: preserve multimodal attachments in convert_messages Stop flattening user content to text in the enclave. Convert OpenAI-format content parts (text / image_url / file) into LangChain v1 standard content blocks so images and PDFs reach the provider natively instead of being dropped. Text-only content still collapses to a plain string. No new dependencies: the pinned langchain-* versions already translate standard image/file blocks to each provider's native API. * feat: gate attachments by model capability + enforce size cap, digest attachments in request hash - validate_attachments(): reject image/PDF parts when the target model's LangChain profile explicitly lacks support (fails open for unknown models), and enforce a 30 MB inline attachment cap. Wired into create_chat_completion so it covers both the direct and OHTTP-inner paths. - Request hashing now canonicalizes multimodal user content, replacing inline base64 with a sha256 digest so the signed request commits to the exact attachment bytes without bloating the hashed payload. * Simplify multimodal content handling; pass parts through to providers (#94) * image gen format fixes (#91) * testing image format fix * review fixes * lint fix * Minimize attachment handling: keep provider-native image pass-through Revert the bespoke image-conversion path in convert_messages to main's raw pass-through (text/image parts already convert correctly to every provider's native API, so images keep working untouched). Only file/PDF parts are rewritten to LangChain standard file blocks, since Anthropic needs a 'document' block and rejects OpenAI's raw file shape. Capability gating, the per-request size cap, and request-hash canonicalization are retained. Drop the design doc. --------- Co-authored-by: Aniket Dixit <47004499+dixitaniket@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com> * Simplify attachment validation and canonicalization (#97) * Stop importing private helpers across modules The attachment work reached into llm_backend's private _convert_content_part from the chat controller, and the tests pulled the controller's private _canonical_user_content. Move the request-hashing canonicalizer into llm_backend (next to the converter it depends on) and expose it as the public canonical_user_content. The controller and tests now import only public names, and _convert_content_part stays internal to llm_backend. No behavior change. https://claude.ai/code/session_013cbCKjFXib5LbSv9Uu7WUq * Sign attachment filenames, not their bytes Drop the sha256 digesting of inline attachment content from the request canonicalization. The signed request now commits to text verbatim and each attachment's type + filename only; the bytes are left out entirely (they still travel inside the encrypted transport). This removes the base64 walking and hashing machinery and lets canonical_user_content stop depending on the content converter. https://claude.ai/code/session_013cbCKjFXib5LbSv9Uu7WUq * Drop redundant attachment size cap; keep modality gating The 30 MB attachment cap overlapped the 16 MB cap already enforced on the encrypted request body in ohttp_controller, which always fires first. Remove MAX_ATTACHMENT_BYTES, _decoded_base64_len and the byte-summing loop so validate_attachments is just modality gating (reject images/docs a model can't handle, fail open when capabilities are unknown). With the 413 path gone, AttachmentValidationError no longer needs a custom status, and the validator returns early when the model accepts every modality. https://claude.ai/code/session_013cbCKjFXib5LbSv9Uu7WUq --------- Co-authored-by: Claude <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Aniket Dixit <47004499+dixitaniket@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

claude added 3 commits June 8, 2026 17:06

adambalogh marked this pull request as ready for review June 8, 2026 17:17

adambalogh merged commit 30d3279 into claude/fervent-goodall-hrY9f Jun 8, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify attachment validation and canonicalization#97

Simplify attachment validation and canonicalization#97
adambalogh merged 3 commits into
claude/fervent-goodall-hrY9ffrom
claude/intelligent-dirac-d7vf5w

adambalogh commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adambalogh commented Jun 8, 2026

Summary

Key Changes

Implementation Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants