Configure rate limits on VirtualMCPServer PR B#5276
Open
Sanskarzz wants to merge 2 commits into
Open
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5276 +/- ##
==========================================
+ Coverage 68.39% 68.45% +0.06%
==========================================
Files 619 620 +1
Lines 63318 63422 +104
==========================================
+ Hits 43305 43415 +110
+ Misses 16782 16770 -12
- Partials 3231 3237 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
695a96c to
480152f
Compare
Contributor
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR wires the
VirtualMCPServerrate-limiting config added in PR A into the vMCP runtime so configured limits are actually enforced before requests reach aggregated backends.spec.config.rateLimitingfrom the vMCP runtime config intovmcpserver.Config.call_toolrequests are rate-limited using the innerarguments.tool_name.call_toolbehavior.perUser.maxTokens=1, then verifies the second authenticatedtools/callis rejected.Fixes: #4552
Type of change
Test plan
Changes
pkg/vmcp/cli/serve.goRateLimitingintovmcpserver.Config.pkg/vmcp/server/server.goNew(), and refactors MCP middleware composition.pkg/vmcp/server/ratelimit.gopkg/ratelimit/middleware.gopkg/vmcp/server/ratelimit_test.gotest/e2e/thv-operator/virtualmcp/virtualmcp_rate_limiting_test.gomanual-testing-ratelimit-vmcp.mdDoes this introduce a user-facing change?
Yes. Cluster admins can now configure
VirtualMCPServer.spec.config.rateLimitingand vMCP will enforce the configured shared, per-user, and per-tool limits fortools/callrequests at runtime.Implementation plan
Approved implementation plan
sessionStorageandrateLimitingconfig.tools/callforcall_toolto the innerarguments.tool_namevalue.Special notes for reviewers
tools/callparameters.kubectl port-forwardfor vMCP/OIDC access instead of assuming NodePorts are reachable onlocalhost, which makes it work reliably with local kind clusters and CI environments.Manual testing
If the cluster is not already running with local images:
1. PR A: Validate CRD Admission Rules
1.1. Reject rate limiting without Redis session storage
Expected: the API server rejects the object with:
1.2. Reject per-user rate limiting without OIDC auth
Expected: the API server rejects the object with:
2. PR A: Verify Runtime ConfigMap Serialization
Create Redis first and wait for it to accept traffic. This avoids a transient vMCP restart while Redis is still starting.
Create a valid shared-limit vMCP. This uses anonymous auth so it can be inspected without setting up OIDC manually.
Wait for resources:
Expected: the
VirtualMCPServerreachesReady. It may briefly showDegradedwhile the backend health check warms up.Inspect the generated vMCP ConfigMap:
Expected:
config.yamlcontains:This confirms PR A’s field survives CRD -> converter -> ConfigMap serialization.
3. PR B: Verify Runtime Enforcement
The most reliable manual runtime check is the focused E2E test. It deploys Redis, a parameterized OIDC server, a backend MCPServer, and a VirtualMCPServer with
perUser.maxTokens=1. Then it sends two authenticatedtools/callrequests from the same user and expects the second request to be rejected.Expected:
This confirms PR B:
config.rateLimiting.sessionStorage.tools/callis rejected with rate-limit error behavior.4. Optional: Inspect Runtime Logs
Expected log line:
If Redis cannot be reached at startup, vMCP should fail to build the handler and logs should show:
5. Cleanup
The focused E2E test creates timestamped resources and cleans them up automatically.