fix(api): enforce configurable upload size limit by kpadilha · Pull Request #231 · Stanzin7/ExtensionShield

kpadilha · 2026-05-11T18:16:06Z

Summary

Fixes #146.

The upload endpoint already had a hardcoded 100 MiB check, but it read the whole upload before applying that check. This keeps the same default limit and moves it into config so deployments can tune it.

Changes:

add UPLOAD_MAX_BYTES, defaulting to 104857600 bytes;
read uploads in bounded chunks and return 413 once the configured limit is exceeded;
document the setting in the env templates and Railway config;
add regression tests for the oversized path, the exact-limit reader boundary, and invalid env fallback.

Validation

uv run python -m pytest tests/api/test_upload_limits.py tests/test_extension_extract.py -q
# 5 passed, 2 warnings

uv run python -m pytest tests -q
# 418 passed, 20 warnings

uv run python -m py_compile src/extension_shield/api/main.py src/extension_shield/core/config.py
# success

git diff --check
# success

The warnings are dependency/deprecation warnings already present in the test environment.

Summary by CodeRabbit

New Features
- Configurable upload size limits with clear error messaging for oversized files
- ZIP extraction constraints for file count and maximum decompressed size

coderabbitai · 2026-05-11T18:16:16Z

Warning

Rate limit exceeded

@kpadilha has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 53 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 459e5cf9-59af-4c35-88f6-05bfe9c4563c

📥 Commits

Reviewing files that changed from the base of the PR and between 3d95862 and 8cd4b37.

📒 Files selected for processing (7)

.env.example
deploy/env.production.template
deploy/railway.toml
railway.toml
src/extension_shield/api/main.py
src/extension_shield/core/config.py
tests/api/test_upload_limits.py

📝 Walkthrough

Walkthrough

This PR implements configurable file upload size limits to prevent denial-of-service attacks via large uploads. It adds a new UPLOAD_MAX_BYTES environment variable parsed into the application settings, enforces the limit using chunked reading with early rejection (HTTP 413) in the /api/scan/upload endpoint, and provides test coverage across development and production deployment configurations.

Changes

Upload Size Limits for DoS Prevention

Layer / File(s)	Summary
Configuration Schema and Parsing `src/extension_shield/core/config.py`	Settings dataclass gains `upload_max_bytes: int` field; `get_settings()` parses `UPLOAD_MAX_BYTES` environment variable with fallback to `104857600` bytes and enforces positive value constraint.
Upload Size Enforcement in API `src/extension_shield/api/main.py`	Introduces `_format_byte_limit()` for consistent error messages and `_read_upload_with_limit()` for chunked reading with early rejection; `/api/scan/upload` endpoint uses bounded reader with `settings.upload_max_bytes` to reject oversized uploads before full buffering.
Environment and Deployment Configuration `.env.example`, `deploy/env.production.template`, `railway.toml`, `deploy/railway.toml`	Defines `UPLOAD_MAX_BYTES` across development, production, and Railway deployments with consistent `104857600`-byte default.
Test Coverage for Upload Limits `tests/api/test_upload_limits.py`	API integration test verifies 413 rejection for oversized uploads with correct error message and no side effects; unit test confirms bounded reader accepts payloads at exact limit; environment fallback test validates invalid `UPLOAD_MAX_BYTES` reverts to default.

Sequence Diagram

sequenceDiagram
  participant Client
  participant ScanUploadEndpoint as /api/scan/upload
  participant BoundedReader as _read_upload_with_limit
  participant UploadFile
  participant Settings
  
  Client->>ScanUploadEndpoint: POST file
  ScanUploadEndpoint->>Settings: fetch upload_max_bytes
  ScanUploadEndpoint->>BoundedReader: call with file + max_bytes
  loop Read chunks
    BoundedReader->>UploadFile: read(chunk_size)
    UploadFile-->>BoundedReader: chunk bytes
    BoundedReader->>BoundedReader: accumulate size
    alt Size exceeded
      BoundedReader-->>ScanUploadEndpoint: HTTPException 413
      ScanUploadEndpoint-->>Client: 413 + error detail
    else Size OK
      BoundedReader->>BoundedReader: continue
    end
  end
  BoundedReader-->>ScanUploadEndpoint: full content
  ScanUploadEndpoint->>ScanUploadEndpoint: process upload normally

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A bouncer stands guard at the upload gate,
No giant files shall seal our fate!
With chunks read small and limits set,
DoS attacks we shall beset.
One-hundred megs, no more, no less—
Security's a long-eared success! 🐇

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(api): enforce configurable upload size limit' clearly summarizes the main change—adding enforced, configurable upload size limiting to the API.
Linked Issues check	✅ Passed	All coding requirements from issue `#146` are met: configurable upload size limit added with default 100 MiB, bounded chunked reading implemented to avoid memory exhaustion, HTTP 413 rejection for oversized uploads, comprehensive regression tests provided, and environment configuration documented.
Out of Scope Changes check	✅ Passed	All changes directly support the upload size limiting feature from issue `#146`; no unrelated modifications detected across config files, API endpoint, settings, and test coverage.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

src/extension_shield/core/config.py (1)

213-222: ⚡ Quick win

Consider extracting the hardcoded default to a module constant.

The default value 104857600 is repeated three times (lines 215, 219, 221). While the defensive parsing logic is sound (handles ValueError and validates > 0), extracting the default to a named constant would improve maintainability.

♻️ Suggested refactor

+# Upload and extraction limits (DoS protection)
+_DEFAULT_UPLOAD_MAX_BYTES = 104857600  # 100 MiB
+
 `@lru_cache`(maxsize=1)
 def get_settings() -> Settings:
     """
     ...
     """
     ...
     # Upload and zip-bomb protection: reject large uploads before saving/analysis,
     # then bound extracted archive complexity separately.
-    _upload_max_bytes = os.environ.get("UPLOAD_MAX_BYTES", "104857600")  # 100 MiB
+    _upload_max_bytes = os.environ.get("UPLOAD_MAX_BYTES", str(_DEFAULT_UPLOAD_MAX_BYTES))
     try:
         upload_max_bytes = int(_upload_max_bytes)
     except ValueError:
-        upload_max_bytes = 104857600
+        upload_max_bytes = _DEFAULT_UPLOAD_MAX_BYTES
     if upload_max_bytes <= 0:
-        upload_max_bytes = 104857600
+        upload_max_bytes = _DEFAULT_UPLOAD_MAX_BYTES

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/extension_shield/core/config.py` around lines 213 - 222, Extract the
repeated literal 104857600 into a named module-level constant (e.g.,
DEFAULT_UPLOAD_MAX_BYTES) and use that constant in the upload parsing logic:
replace the three occurrences of the literal in the _upload_max_bytes default,
the except ValueError fallback for upload_max_bytes, and the non-positive check
(currently operating on upload_max_bytes) with DEFAULT_UPLOAD_MAX_BYTES; ensure
the new constant is defined near the top of the module and referenced by the
existing variables _upload_max_bytes and upload_max_bytes to preserve the
current parsing/validation behavior.

src/extension_shield/api/main.py (1)

180-199: 💤 Low value

Consider optimizing chunk size calculation to minimize excess reads.

The bounded reader correctly enforces upload limits and rejects oversized payloads before they're saved. However, line 187 reads min(chunk_size, max_bytes + 1) on every iteration, which means the function may read up to max_bytes + chunk_size bytes (e.g., 101 MiB for a 100 MiB limit) before detecting an oversized upload.

For the typical use case (100 MiB limit, 1 MiB chunks), this overhead is acceptable and the implementation is correct. However, reading min(chunk_size, max_bytes - total + 1) would ensure you never read more than max_bytes + 1 bytes total, reducing memory pressure for edge cases.
♻️ Potential optimization
 async def _read_upload_with_limit(file: UploadFile, max_bytes: int) -> bytes:
     """Read an UploadFile without accepting payloads larger than max_bytes."""
     chunk_size = 1024 * 1024
     chunks = []
     total = 0

     while True:
-        chunk = await file.read(min(chunk_size, max_bytes + 1))
+        remaining = max_bytes - total + 1
+        chunk = await file.read(min(chunk_size, remaining))
         if not chunk:
             break
         total += len(chunk)
         if total > max_bytes:
             raise HTTPException(
                 status_code=413,
                 detail=f"File too large. Maximum size is {_format_byte_limit(max_bytes)}.",
             )
         chunks.append(chunk)

     return b"".join(chunks)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/extension_shield/api/main.py` around lines 180 - 199, In
_read_upload_with_limit, avoid over-reading by changing the read length
calculation: instead of await file.read(min(chunk_size, max_bytes + 1)), compute
the remaining allowance using max_bytes - total and call await
file.read(min(chunk_size, max(0, max_bytes - total) + 1)) (or equivalent) so
each iteration never requests more than the remaining bytes plus one; keep the
existing total update, overflow check (if total > max_bytes -> HTTPException),
and chunk append logic intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/extension_shield/api/main.py`:
- Around line 180-199: In _read_upload_with_limit, avoid over-reading by
changing the read length calculation: instead of await file.read(min(chunk_size,
max_bytes + 1)), compute the remaining allowance using max_bytes - total and
call await file.read(min(chunk_size, max(0, max_bytes - total) + 1)) (or
equivalent) so each iteration never requests more than the remaining bytes plus
one; keep the existing total update, overflow check (if total > max_bytes ->
HTTPException), and chunk append logic intact.

In `@src/extension_shield/core/config.py`:
- Around line 213-222: Extract the repeated literal 104857600 into a named
module-level constant (e.g., DEFAULT_UPLOAD_MAX_BYTES) and use that constant in
the upload parsing logic: replace the three occurrences of the literal in the
_upload_max_bytes default, the except ValueError fallback for upload_max_bytes,
and the non-positive check (currently operating on upload_max_bytes) with
DEFAULT_UPLOAD_MAX_BYTES; ensure the new constant is defined near the top of the
module and referenced by the existing variables _upload_max_bytes and
upload_max_bytes to preserve the current parsing/validation behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 087626a3-4ef3-4f92-8294-c3fd274957b6

📥 Commits

Reviewing files that changed from the base of the PR and between 46f2355 and 3d95862.

📒 Files selected for processing (7)

.env.example
deploy/env.production.template
deploy/railway.toml
railway.toml
src/extension_shield/api/main.py
src/extension_shield/core/config.py
tests/api/test_upload_limits.py

Reject oversized CRX/ZIP uploads with a configurable UPLOAD_MAX_BYTES limit before saving or analysis. Document the setting in deployment templates and add regression coverage for rejection, boundary acceptance, and invalid env fallback.

kpadilha force-pushed the fix/configurable-upload-limit branch from 21cbdee to 3d95862 Compare May 11, 2026 18:47

kpadilha marked this pull request as ready for review May 11, 2026 18:48

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

fix(api): enforce configurable upload size limit

8cd4b37

Reject oversized CRX/ZIP uploads with a configurable UPLOAD_MAX_BYTES limit before saving or analysis. Document the setting in deployment templates and add regression coverage for rejection, boundary acceptance, and invalid env fallback.

kpadilha force-pushed the fix/configurable-upload-limit branch from 3d95862 to 8cd4b37 Compare May 11, 2026 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): enforce configurable upload size limit#231

fix(api): enforce configurable upload size limit#231
kpadilha wants to merge 1 commit into
Stanzin7:masterfrom
kpadilha:fix/configurable-upload-limit

kpadilha commented May 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kpadilha commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kpadilha commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading