Skip to content

fix(api): enforce configurable upload size limit#231

Open
kpadilha wants to merge 1 commit into
Stanzin7:masterfrom
kpadilha:fix/configurable-upload-limit
Open

fix(api): enforce configurable upload size limit#231
kpadilha wants to merge 1 commit into
Stanzin7:masterfrom
kpadilha:fix/configurable-upload-limit

Conversation

@kpadilha
Copy link
Copy Markdown

@kpadilha kpadilha commented May 11, 2026

Summary

Fixes #146.

The upload endpoint already had a hardcoded 100 MiB check, but it read the whole upload before applying that check. This keeps the same default limit and moves it into config so deployments can tune it.

Changes:

  • add UPLOAD_MAX_BYTES, defaulting to 104857600 bytes;
  • read uploads in bounded chunks and return 413 once the configured limit is exceeded;
  • document the setting in the env templates and Railway config;
  • add regression tests for the oversized path, the exact-limit reader boundary, and invalid env fallback.

Validation

uv run python -m pytest tests/api/test_upload_limits.py tests/test_extension_extract.py -q
# 5 passed, 2 warnings

uv run python -m pytest tests -q
# 418 passed, 20 warnings

uv run python -m py_compile src/extension_shield/api/main.py src/extension_shield/core/config.py
# success

git diff --check
# success

The warnings are dependency/deprecation warnings already present in the test environment.

Summary by CodeRabbit

  • New Features
    • Configurable upload size limits with clear error messaging for oversized files
    • ZIP extraction constraints for file count and maximum decompressed size

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

Warning

Rate limit exceeded

@kpadilha has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 53 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 459e5cf9-59af-4c35-88f6-05bfe9c4563c

📥 Commits

Reviewing files that changed from the base of the PR and between 3d95862 and 8cd4b37.

📒 Files selected for processing (7)
  • .env.example
  • deploy/env.production.template
  • deploy/railway.toml
  • railway.toml
  • src/extension_shield/api/main.py
  • src/extension_shield/core/config.py
  • tests/api/test_upload_limits.py
📝 Walkthrough

Walkthrough

This PR implements configurable file upload size limits to prevent denial-of-service attacks via large uploads. It adds a new UPLOAD_MAX_BYTES environment variable parsed into the application settings, enforces the limit using chunked reading with early rejection (HTTP 413) in the /api/scan/upload endpoint, and provides test coverage across development and production deployment configurations.

Changes

Upload Size Limits for DoS Prevention

Layer / File(s) Summary
Configuration Schema and Parsing
src/extension_shield/core/config.py
Settings dataclass gains upload_max_bytes: int field; get_settings() parses UPLOAD_MAX_BYTES environment variable with fallback to 104857600 bytes and enforces positive value constraint.
Upload Size Enforcement in API
src/extension_shield/api/main.py
Introduces _format_byte_limit() for consistent error messages and _read_upload_with_limit() for chunked reading with early rejection; /api/scan/upload endpoint uses bounded reader with settings.upload_max_bytes to reject oversized uploads before full buffering.
Environment and Deployment Configuration
.env.example, deploy/env.production.template, railway.toml, deploy/railway.toml
Defines UPLOAD_MAX_BYTES across development, production, and Railway deployments with consistent 104857600-byte default.
Test Coverage for Upload Limits
tests/api/test_upload_limits.py
API integration test verifies 413 rejection for oversized uploads with correct error message and no side effects; unit test confirms bounded reader accepts payloads at exact limit; environment fallback test validates invalid UPLOAD_MAX_BYTES reverts to default.

Sequence Diagram

sequenceDiagram
  participant Client
  participant ScanUploadEndpoint as /api/scan/upload
  participant BoundedReader as _read_upload_with_limit
  participant UploadFile
  participant Settings
  
  Client->>ScanUploadEndpoint: POST file
  ScanUploadEndpoint->>Settings: fetch upload_max_bytes
  ScanUploadEndpoint->>BoundedReader: call with file + max_bytes
  loop Read chunks
    BoundedReader->>UploadFile: read(chunk_size)
    UploadFile-->>BoundedReader: chunk bytes
    BoundedReader->>BoundedReader: accumulate size
    alt Size exceeded
      BoundedReader-->>ScanUploadEndpoint: HTTPException 413
      ScanUploadEndpoint-->>Client: 413 + error detail
    else Size OK
      BoundedReader->>BoundedReader: continue
    end
  end
  BoundedReader-->>ScanUploadEndpoint: full content
  ScanUploadEndpoint->>ScanUploadEndpoint: process upload normally
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A bouncer stands guard at the upload gate,
No giant files shall seal our fate!
With chunks read small and limits set,
DoS attacks we shall beset.
One-hundred megs, no more, no less—
Security's a long-eared success! 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(api): enforce configurable upload size limit' clearly summarizes the main change—adding enforced, configurable upload size limiting to the API.
Linked Issues check ✅ Passed All coding requirements from issue #146 are met: configurable upload size limit added with default 100 MiB, bounded chunked reading implemented to avoid memory exhaustion, HTTP 413 rejection for oversized uploads, comprehensive regression tests provided, and environment configuration documented.
Out of Scope Changes check ✅ Passed All changes directly support the upload size limiting feature from issue #146; no unrelated modifications detected across config files, API endpoint, settings, and test coverage.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added area: backend Changes to the Python backend and scanning pipeline area: infra CI, deployment, database, or repository automation changes area: tests Changes to test coverage or test infrastructure bug Bug report or bug fix related work feature Feature request or feature implementation labels May 11, 2026
@kpadilha kpadilha force-pushed the fix/configurable-upload-limit branch from 21cbdee to 3d95862 Compare May 11, 2026 18:47
@kpadilha kpadilha marked this pull request as ready for review May 11, 2026 18:48
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/extension_shield/core/config.py (1)

213-222: ⚡ Quick win

Consider extracting the hardcoded default to a module constant.

The default value 104857600 is repeated three times (lines 215, 219, 221). While the defensive parsing logic is sound (handles ValueError and validates > 0), extracting the default to a named constant would improve maintainability.

♻️ Suggested refactor
+# Upload and extraction limits (DoS protection)
+_DEFAULT_UPLOAD_MAX_BYTES = 104857600  # 100 MiB
+
 `@lru_cache`(maxsize=1)
 def get_settings() -> Settings:
     """
     ...
     """
     ...
     # Upload and zip-bomb protection: reject large uploads before saving/analysis,
     # then bound extracted archive complexity separately.
-    _upload_max_bytes = os.environ.get("UPLOAD_MAX_BYTES", "104857600")  # 100 MiB
+    _upload_max_bytes = os.environ.get("UPLOAD_MAX_BYTES", str(_DEFAULT_UPLOAD_MAX_BYTES))
     try:
         upload_max_bytes = int(_upload_max_bytes)
     except ValueError:
-        upload_max_bytes = 104857600
+        upload_max_bytes = _DEFAULT_UPLOAD_MAX_BYTES
     if upload_max_bytes <= 0:
-        upload_max_bytes = 104857600
+        upload_max_bytes = _DEFAULT_UPLOAD_MAX_BYTES
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/extension_shield/core/config.py` around lines 213 - 222, Extract the
repeated literal 104857600 into a named module-level constant (e.g.,
DEFAULT_UPLOAD_MAX_BYTES) and use that constant in the upload parsing logic:
replace the three occurrences of the literal in the _upload_max_bytes default,
the except ValueError fallback for upload_max_bytes, and the non-positive check
(currently operating on upload_max_bytes) with DEFAULT_UPLOAD_MAX_BYTES; ensure
the new constant is defined near the top of the module and referenced by the
existing variables _upload_max_bytes and upload_max_bytes to preserve the
current parsing/validation behavior.
src/extension_shield/api/main.py (1)

180-199: 💤 Low value

Consider optimizing chunk size calculation to minimize excess reads.

The bounded reader correctly enforces upload limits and rejects oversized payloads before they're saved. However, line 187 reads min(chunk_size, max_bytes + 1) on every iteration, which means the function may read up to max_bytes + chunk_size bytes (e.g., 101 MiB for a 100 MiB limit) before detecting an oversized upload.

For the typical use case (100 MiB limit, 1 MiB chunks), this overhead is acceptable and the implementation is correct. However, reading min(chunk_size, max_bytes - total + 1) would ensure you never read more than max_bytes + 1 bytes total, reducing memory pressure for edge cases.

♻️ Potential optimization
 async def _read_upload_with_limit(file: UploadFile, max_bytes: int) -> bytes:
     """Read an UploadFile without accepting payloads larger than max_bytes."""
     chunk_size = 1024 * 1024
     chunks = []
     total = 0

     while True:
-        chunk = await file.read(min(chunk_size, max_bytes + 1))
+        remaining = max_bytes - total + 1
+        chunk = await file.read(min(chunk_size, remaining))
         if not chunk:
             break
         total += len(chunk)
         if total > max_bytes:
             raise HTTPException(
                 status_code=413,
                 detail=f"File too large. Maximum size is {_format_byte_limit(max_bytes)}.",
             )
         chunks.append(chunk)

     return b"".join(chunks)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/extension_shield/api/main.py` around lines 180 - 199, In
_read_upload_with_limit, avoid over-reading by changing the read length
calculation: instead of await file.read(min(chunk_size, max_bytes + 1)), compute
the remaining allowance using max_bytes - total and call await
file.read(min(chunk_size, max(0, max_bytes - total) + 1)) (or equivalent) so
each iteration never requests more than the remaining bytes plus one; keep the
existing total update, overflow check (if total > max_bytes -> HTTPException),
and chunk append logic intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/extension_shield/api/main.py`:
- Around line 180-199: In _read_upload_with_limit, avoid over-reading by
changing the read length calculation: instead of await file.read(min(chunk_size,
max_bytes + 1)), compute the remaining allowance using max_bytes - total and
call await file.read(min(chunk_size, max(0, max_bytes - total) + 1)) (or
equivalent) so each iteration never requests more than the remaining bytes plus
one; keep the existing total update, overflow check (if total > max_bytes ->
HTTPException), and chunk append logic intact.

In `@src/extension_shield/core/config.py`:
- Around line 213-222: Extract the repeated literal 104857600 into a named
module-level constant (e.g., DEFAULT_UPLOAD_MAX_BYTES) and use that constant in
the upload parsing logic: replace the three occurrences of the literal in the
_upload_max_bytes default, the except ValueError fallback for upload_max_bytes,
and the non-positive check (currently operating on upload_max_bytes) with
DEFAULT_UPLOAD_MAX_BYTES; ensure the new constant is defined near the top of the
module and referenced by the existing variables _upload_max_bytes and
upload_max_bytes to preserve the current parsing/validation behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 087626a3-4ef3-4f92-8294-c3fd274957b6

📥 Commits

Reviewing files that changed from the base of the PR and between 46f2355 and 3d95862.

📒 Files selected for processing (7)
  • .env.example
  • deploy/env.production.template
  • deploy/railway.toml
  • railway.toml
  • src/extension_shield/api/main.py
  • src/extension_shield/core/config.py
  • tests/api/test_upload_limits.py

Reject oversized CRX/ZIP uploads with a configurable UPLOAD_MAX_BYTES limit before saving or analysis. Document the setting in deployment templates and add regression coverage for rejection, boundary acceptance, and invalid env fallback.
@kpadilha kpadilha force-pushed the fix/configurable-upload-limit branch from 3d95862 to 8cd4b37 Compare May 11, 2026 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: backend Changes to the Python backend and scanning pipeline area: infra CI, deployment, database, or repository automation changes area: tests Changes to test coverage or test infrastructure bug Bug report or bug fix related work feature Feature request or feature implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: No documented file size limit on CRX/ZIP upload — DoS via large file upload

1 participant