Skip to content

Configure rate limits on VirtualMCPServer PR B#5276

Open
Sanskarzz wants to merge 2 commits into
stacklok:mainfrom
Sanskarzz:ratelimitingVMCP2
Open

Configure rate limits on VirtualMCPServer PR B#5276
Sanskarzz wants to merge 2 commits into
stacklok:mainfrom
Sanskarzz:ratelimitingVMCP2

Conversation

@Sanskarzz
Copy link
Copy Markdown
Contributor

@Sanskarzz Sanskarzz commented May 14, 2026

Summary

This PR wires the VirtualMCPServer rate-limiting config added in PR A into the vMCP runtime so configured limits are actually enforced before requests reach aggregated backends.

  • Passes spec.config.rateLimiting from the vMCP runtime config into vmcpserver.Config.
  • Builds a Redis-backed rate-limit middleware during vMCP server construction and closes the Redis client during server shutdown.
  • Refactors the vMCP MCP middleware chain so requests are authenticated and parsed before rate limiting, allowing per-user and per-tool decisions.
  • Adds optimizer-aware tool-name resolution so optimizer call_tool requests are rate-limited using the inner arguments.tool_name.
  • Adds unit coverage for per-user, per-tool, post-aggregation names, and optimizer call_tool behavior.
  • Adds a focused E2E test that deploys Redis, OIDC, a backend MCPServer, and a vMCP with perUser.maxTokens=1, then verifies the second authenticated tools/call is rejected.

Fixes: #4552

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests
  • E2E tests
  • Linting
  • Manual testing (Described Below)

Changes

File Change
pkg/vmcp/cli/serve.go Passes vMCP namespace and RateLimiting into vmcpserver.Config.
pkg/vmcp/server/server.go Adds rate-limit config to server config, initializes middleware once in New(), and refactors MCP middleware composition.
pkg/vmcp/server/ratelimit.go Adds vMCP-specific Redis client lifecycle, limiter construction, and optimizer-aware tool-name resolution.
pkg/ratelimit/middleware.go Exposes reusable middleware construction with an optional tool-name resolver.
pkg/vmcp/server/ratelimit_test.go Adds unit tests for shared/per-user/per-tool/optimizer behavior.
test/e2e/thv-operator/virtualmcp/virtualmcp_rate_limiting_test.go Adds focused vMCP per-user rate-limit E2E test.
manual-testing-ratelimit-vmcp.md Documents manual PR A + PR B verification steps.

Does this introduce a user-facing change?

Yes. Cluster admins can now configure VirtualMCPServer.spec.config.rateLimiting and vMCP will enforce the configured shared, per-user, and per-tool limits for tools/call requests at runtime.

Implementation plan

Approved implementation plan
  1. Reuse the rate-limit config type from PR A and pass it through vMCP startup configuration.
  2. Build a Redis-backed limiter from vMCP sessionStorage and rateLimiting config.
  3. Insert rate limiting into the MCP middleware chain after auth and MCP request parsing.
  4. Preserve existing middleware behavior by keeping discovery, audit, authz, backend enrichment, telemetry, header validation, write-timeout handling, and recovery in the intended order.
  5. Resolve per-tool limits against post-aggregation tool names.
  6. In optimizer mode, map tools/call for call_tool to the inner arguments.tool_name value.
  7. Add unit tests for key rate-limit behavior.
  8. Add a focused operator E2E test for per-user enforcement.

Special notes for reviewers

  • The middleware chain refactor is intentional and worth focused review. Rate limiting must run after auth and MCP parsing so it can read authenticated identity and tools/call parameters.
  • The focused E2E uses kubectl port-forward for vMCP/OIDC access instead of assuming NodePorts are reachable on localhost, which makes it work reliably with local kind clusters and CI environments.
  • Request-time Redis errors fail open in the shared rate-limit middleware, matching existing middleware behavior. Startup still validates Redis connectivity when rate limiting is configured.

Manual testing

If the cluster is not already running with local images:

task kind-with-toolhive-operator-local
export KUBECONFIG="$PWD/kconfig.yaml"

1. PR A: Validate CRD Admission Rules

1.1. Reject rate limiting without Redis session storage

kubectl apply -f - <<'EOF'
apiVersion: toolhive.stacklok.dev/v1beta1
kind: VirtualMCPServer
metadata:
  name: vmcp-rl-invalid-no-redis
  namespace: default
spec:
  groupRef:
    name: vmcp-rl-manual-group
  incomingAuth:
    type: anonymous
  config:
    rateLimiting:
      shared:
        maxTokens: 1
        refillPeriod: 1m
EOF

Expected: the API server rejects the object with:

config.rateLimiting requires sessionStorage with provider 'redis'

1.2. Reject per-user rate limiting without OIDC auth

kubectl apply -f - <<'EOF'
apiVersion: toolhive.stacklok.dev/v1beta1
kind: VirtualMCPServer
metadata:
  name: vmcp-rl-invalid-peruser-anon
  namespace: default
spec:
  groupRef:
    name: vmcp-rl-manual-group
  incomingAuth:
    type: anonymous
  sessionStorage:
    provider: redis
    address: redis-rl-manual.default.svc.cluster.local:6379
  config:
    rateLimiting:
      perUser:
        maxTokens: 1
        refillPeriod: 1m
EOF

Expected: the API server rejects the object with:

config.rateLimiting.perUser requires incomingAuth.type oidc

2. PR A: Verify Runtime ConfigMap Serialization

Create Redis first and wait for it to accept traffic. This avoids a transient vMCP restart while Redis is still starting.

kubectl apply -f - <<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-rl-manual
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-rl-manual
  template:
    metadata:
      labels:
        app: redis-rl-manual
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
  name: redis-rl-manual
  namespace: default
spec:
  selector:
    app: redis-rl-manual
  ports:
  - name: redis
    port: 6379
    targetPort: 6379
EOF

kubectl wait --for=condition=Available deployment/redis-rl-manual -n default --timeout=120s

Create a valid shared-limit vMCP. This uses anonymous auth so it can be inspected without setting up OIDC manually.

kubectl apply -f - <<'EOF'
---
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPGroup
metadata:
  name: vmcp-rl-manual-group
  namespace: default
spec:
  description: Manual vMCP rate-limit group
---
apiVersion: toolhive.stacklok.dev/v1beta1
kind: MCPServer
metadata:
  name: vmcp-rl-manual-yardstick
  namespace: default
spec:
  groupRef:
    name: vmcp-rl-manual-group
  image: ghcr.io/stackloklabs/yardstick/yardstick-server:1.1.1
  transport: streamable-http
  proxyPort: 8080
  mcpPort: 8080
---
apiVersion: toolhive.stacklok.dev/v1beta1
kind: VirtualMCPServer
metadata:
  name: vmcp-rl-manual
  namespace: default
spec:
  groupRef:
    name: vmcp-rl-manual-group
  incomingAuth:
    type: anonymous
  sessionStorage:
    provider: redis
    address: redis-rl-manual.default.svc.cluster.local:6379
  serviceType: NodePort
  config:
    rateLimiting:
      shared:
        maxTokens: 1
        refillPeriod: 1m
EOF

Wait for resources:

kubectl wait --for=jsonpath='{.status.phase}'=Ready mcpserver/vmcp-rl-manual-yardstick -n default --timeout=180s
kubectl wait --for=jsonpath='{.status.phase}'=Ready virtualmcpserver/vmcp-rl-manual -n default --timeout=300s
kubectl get virtualmcpserver vmcp-rl-manual -n default

Expected: the VirtualMCPServer reaches Ready. It may briefly show Degraded while the backend health check warms up.

Inspect the generated vMCP ConfigMap:

kubectl get configmap vmcp-rl-manual-vmcp-config -n default -o yaml

Expected: config.yaml contains:

rateLimiting:
  shared:
    maxTokens: 1
    refillPeriod:
      duration: 1m0s
sessionStorage:
  provider: redis
  address: redis-rl-manual.default.svc.cluster.local:6379

This confirms PR A’s field survives CRD -> converter -> ConfigMap serialization.

3. PR B: Verify Runtime Enforcement

The most reliable manual runtime check is the focused E2E test. It deploys Redis, a parameterized OIDC server, a backend MCPServer, and a VirtualMCPServer with perUser.maxTokens=1. Then it sends two authenticated tools/call requests from the same user and expects the second request to be rejected.

KUBECONFIG="$PWD/kconfig.yaml" go test ./test/e2e/thv-operator/virtualmcp \
  -run TestE2E \
  -count=1 \
  -timeout=15m \
  -args -ginkgo.v -ginkgo.focus="VirtualMCPServer Rate Limiting"

Expected:

VirtualMCPServer Rate Limiting
  rejects tools/call after the per-user limit is exceeded
SUCCESS!

This confirms PR B:

  • vMCP reads config.rateLimiting.
  • Redis client is created from sessionStorage.
  • Middleware runs after auth and MCP parsing.
  • Per-user bucket is keyed by authenticated subject.
  • Over-limit tools/call is rejected with rate-limit error behavior.

4. Optional: Inspect Runtime Logs

kubectl get pods -n default -l app.kubernetes.io/name=vmcp-rl-manual
kubectl logs -n default deploy/vmcp-rl-manual -c vmcp --since=20m | grep "rate limit middleware enabled"

Expected log line:

rate limit middleware enabled for MCP endpoints

If Redis cannot be reached at startup, vMCP should fail to build the handler and logs should show:

rate limit middleware: failed to connect to Redis

5. Cleanup

kubectl delete virtualmcpserver vmcp-rl-manual -n default --ignore-not-found
kubectl delete mcpserver vmcp-rl-manual-yardstick -n default --ignore-not-found
kubectl delete mcpgroup vmcp-rl-manual-group -n default --ignore-not-found
kubectl delete deployment redis-rl-manual -n default --ignore-not-found
kubectl delete service redis-rl-manual -n default --ignore-not-found

The focused E2E test creates timestamped resources and cleans them up automatically.

@github-actions github-actions Bot added the size/L Large PR: 600-999 lines changed label May 14, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

❌ Patch coverage is 87.09677% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.45%. Comparing base (17451d1) to head (480152f).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/vmcp/server/server.go 82.08% 6 Missing and 6 partials ⚠️
pkg/vmcp/cli/serve.go 71.42% 2 Missing ⚠️
pkg/vmcp/server/ratelimit.go 94.87% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5276      +/-   ##
==========================================
+ Coverage   68.39%   68.45%   +0.06%     
==========================================
  Files         619      620       +1     
  Lines       63318    63422     +104     
==========================================
+ Hits        43305    43415     +110     
+ Misses      16782    16770      -12     
- Partials     3231     3237       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Sanskarzz Sanskarzz marked this pull request as ready for review May 14, 2026 18:17
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels May 14, 2026
Sanskarzz added 2 commits May 15, 2026 19:39
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
@Sanskarzz Sanskarzz force-pushed the ratelimitingVMCP2 branch from 695a96c to 480152f Compare May 15, 2026 14:09
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/L Large PR: 600-999 lines changed labels May 15, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Configure rate limits on VirtualMCPServer

1 participant