Skip to content

Conversation

@jeremyeder
Copy link
Collaborator

this is a prototype UX for using openwebui to interact with the new amber codebase agent added in #337

@jeremyeder jeremyeder enabled auto-merge (squash) November 19, 2025 01:35
@github-actions

This comment has been minimized.

@bobbravo2 bobbravo2 added this to the v0.0.15 milestone Dec 16, 2025
jeremyeder and others added 2 commits December 20, 2025 14:31
Add a Kubernetes-native deployment of Open WebUI with LiteLLM proxy for
chatting with Claude models. This Phase 1 implementation provides a quick,
dev-friendly deployment to Kind cluster with minimal configuration.

Components:
- Base manifests (namespace, deployments, services, PVC, RBAC)
- LiteLLM proxy configured for Claude Sonnet 4.5, 3.7, and Haiku 3.5
- Open WebUI frontend with persistent storage
- Phase 1 overlay for Kind deployment with nginx-ingress
- Comprehensive documentation (README, Phase 1 guide, Phase 2 plan)
- Makefile for deployment automation

Architecture:
- Namespace: openwebui (isolated from ACP)
- Ingress: vteam.local/chat (reuses Kind cluster from e2e)
- Auth: Disabled in Phase 1 (dev/testing only)
- Storage: 500Mi PVC for chat history
- Images: ghcr.io/berriai/litellm, ghcr.io/open-webui/open-webui

Phase 2 (planned):
- OAuth authentication via oauth2-proxy
- Long-running Claude Code service for Amber integration
- Production hardening (secrets, RBAC, monitoring)
- OpenShift compatibility (Routes, SCC compliance)

Deployment:
```bash
cd components/open-webui-llm
# Edit overlays/phase1-kind/secrets.yaml with API key
make phase1-deploy
# Access: http://vteam.local:8080/chat (Podman) or /chat (Docker)
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Increase memory limit from 512Mi to 2Gi to prevent OOMKilled crashes
- Increase CPU limit from 500m to 1000m for better performance
- Update health probe paths to LiteLLM-specific endpoints:
  - /health/liveliness for liveness probe
  - /health/readiness for readiness probe
- Increase resource requests for stability

Fixes LiteLLM pod crash loop due to insufficient memory allocation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@jeremyeder jeremyeder force-pushed the feature/openwebui-litellm-deployment branch from ad4140f to b43bf39 Compare December 20, 2025 19:32
@jeremyeder jeremyeder merged commit f97fa0a into ambient-code:main Dec 20, 2025
2 checks passed
@github-actions
Copy link
Contributor

github-actions bot commented Dec 20, 2025

Claude Code Review

Summary

This PR adds a Phase 1 deployment of Open WebUI + LiteLLM for chatting with Claude models via a Kind cluster. The implementation is well-structured with clear documentation and follows Kubernetes best practices for a development/testing environment. However, there are several critical security issues that must be addressed before this can be considered for any production use (even internal), and some architectural decisions that need discussion.

Key Finding: This is explicitly a dev/testing prototype (Phase 1), with Phase 2 planned for production hardening. The security issues identified are acceptable for local dev but must not be deployed to shared/production environments.


Issues by Severity

🚫 Blocker Issues

None - This is acceptable for Phase 1 dev/testing as documented.

🔴 Critical Issues

1. Missing SecurityContext in Deployments ⚠️

Location: components/open-webui-llm/base/litellm/deployment.yaml, components/open-webui-llm/base/open-webui/deployment.yaml

Issue: Both deployments lack SecurityContext, violating CLAUDE.md container security standards.

From CLAUDE.md (lines 659-670):

SecurityContext: &corev1.SecurityContext{
    AllowPrivilegeEscalation: boolPtr(false),
    ReadOnlyRootFilesystem:   boolPtr(false),  // Only if temp files needed
    Capabilities: &corev1.Capabilities{
        Drop: []corev1.Capability{"ALL"},
    },
},

Impact: Pods run with default security context, potentially allowing privilege escalation.

Recommendation: Add SecurityContext to both deployments:

spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: litellm
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          runAsNonRoot: true

2. Authentication Disabled in Phase 1 (WEBUI_AUTH=false) ⚠️

Location: components/open-webui-llm/base/open-webui/deployment.yaml:37

Issue: WEBUI_AUTH: "false" disables all authentication - anyone with network access can use the UI and consume Anthropic API credits.

Impact:

  • Unauthenticated access to chat interface
  • No user tracking or audit trail
  • Potential API abuse/cost explosion
  • Documented as Phase 1 limitation (acceptable for local dev only)

Recommendation:

  • OK for Phase 1 local Kind clusters
  • NEVER deploy to shared clusters without Phase 2 OAuth
  • Add clear warning in deployment logs

3. Hardcoded Development Master Key Exposed ⚠️

Location: components/open-webui-llm/overlays/phase1-kind/secrets.yaml:11,22

Issue: LITELLM_MASTER_KEY: "sk-litellm-dev-master-key" is hardcoded in git-tracked file.

Impact: Anyone reading this file knows the master key for LiteLLM API access.

Recommendation:

  • Document that this is a dev-only key
  • Add comment: # WARNING: Dev-only key. Generate unique key for production.
  • In Phase 2, generate random keys: openssl rand -base64 32

4. Probe Paths Don't Match LiteLLM Documentation 🔴

Location: components/open-webui-llm/base/litellm/deployment.yaml:56-67

Issue: LiteLLM deployment uses /health/liveliness and /health/readiness, but README.md:65 and Makefile:61 use /health endpoint.

Current:

livenessProbe:
  httpGet:
    path: /health/liveliness  # Typo: liveliness vs liveness
    port: http
readinessProbe:
  httpGet:
    path: /health/readiness

README uses:

curl -s http://litellm-service:4000/health

Impact:

  • Potential probe failures if LiteLLM doesn't expose these specific paths
  • Inconsistency between docs and deployment

Recommendation: Verify LiteLLM's actual health endpoints and align code/docs. Most likely should be:

  • Liveness: /health or /health/liveness (not "liveliness")
  • Readiness: /health or /health/readiness

🟡 Major Issues

5. No Resource Limits Match Original Commit vs Current State

Location: components/open-webui-llm/base/litellm/deployment.yaml:49-55

Issue: Commit message says "100m CPU / 256Mi memory requests" but deployment has:

resources:
  requests:
    cpu: 200m      # Commit says 100m
    memory: 512Mi  # Commit says 256Mi
  limits:
    cpu: 1000m
    memory: 2Gi

Impact: Deployment differs from documented commit message - discrepancy between design and implementation.

Recommendation: Either update deployment to match commit message OR update commit message to reflect actual values. Current values (200m/512Mi) are reasonable for LiteLLM.

6. No OwnerReferences Set on Secrets

Location: components/open-webui-llm/overlays/phase1-kind/secrets.yaml

Issue: Secrets lack OwnerReferences - won't be cleaned up when deployment is deleted.

From CLAUDE.md (lines 447-451):

5. OwnerReferences for Resource Lifecycle
   - REQUIRED: Set OwnerReferences on all child resources (Jobs, Secrets, PVCs, Services)
   - REQUIRED: Use Controller: boolPtr(true) for primary owner

Impact: Secrets persist after make phase1-clean, requiring manual cleanup.

Recommendation: For Phase 2, use Kustomize secret generators OR add OwnerReferences to track deployment lifecycle.

7. Namespace Isolation Not Enforced

Location: components/open-webui-llm/base/namespace.yaml

Issue: No NetworkPolicies to restrict traffic between openwebui namespace and other namespaces (e.g., ambient-code, vteam-dev).

Impact: Pods in openwebui namespace can communicate with ACP components, violating principle of least privilege.

Recommendation: Add NetworkPolicy in Phase 2:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: openwebui-isolation
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: openwebui  # Only allow intra-namespace
  - to:
    - namespaceSelector: {}
      podSelector:
        matchLabels:
          k8s-app: kube-dns  # Allow DNS
  - to:  # Allow Anthropic API
    - podSelector: {}
    ports:
    - port: 443

8. imagePullPolicy: Always on :main-latest Tags

Location: Both deployments use imagePullPolicy: Always with :main-latest and :main tags.

Issue: Unpredictable behavior - deployments can break if upstream images change.

Impact:

  • Non-reproducible deployments
  • Unexpected breaking changes from upstream
  • Increased pull latency

Recommendation: Pin to specific image digests or semantic version tags for Phase 2:

image: ghcr.io/berriai/litellm@sha256:abcd1234...
# OR
image: ghcr.io/berriai/litellm:v1.23.4

9. Memory Limit 4x Higher Than Request (LiteLLM)

Location: components/open-webui-llm/base/litellm/deployment.yaml:52-55

resources:
  requests:
    memory: 512Mi
  limits:
    memory: 2Gi  # 4x request

Issue: Large gap between request and limit can cause OOM kills or QoS issues.

Impact: Pod may be scheduled on node with insufficient memory, leading to node pressure.

Recommendation: Reduce gap to 2x max:

limits:
  memory: 1Gi  # 2x request

🔵 Minor Issues

10. Documentation Says "(coming soon)" for PHASE2.md but File Exists

Location: components/open-webui-llm/README.md:188

See `docs/PHASE2.md` for migration plan (coming soon).

Issue: PHASE2.md exists (428 lines) but README says "coming soon".

Recommendation: Update to: See docs/PHASE2.md for migration plan.

11. Typo in Makefile Comment

Location: components/open-webui-llm/Makefile:19

#  1. Kind cluster running (run: make -C ../../e2e setup-kind)

Issue: Target setup-kind doesn't exist in e2e/Makefile. Correct command is in README: ./scripts/setup-kind.sh.

Recommendation: Update to:

#  1. Kind cluster running (run: cd ../../e2e && ./scripts/setup-kind.sh)

12. .gitignore Includes .vscode but CLAUDE.md Doesn't Recommend IDE Files

Location: components/open-webui-llm/.gitignore:12

Issue: Includes .vscode/ and .idea/, but CLAUDE.md doesn't specify IDE file handling standards.

Recommendation: Acceptable as-is, but consider adding to root .gitignore if repo-wide standard.

13. No Tests for Deployment

Issue: No automated tests to verify deployment works (no Cypress tests, no integration tests).

Recommendation: Add simple smoke test in e2e/ directory for Phase 2:

# Test LiteLLM health
kubectl exec -n openwebui deploy/litellm -- curl -sf http://localhost:4000/health

# Test OpenWebUI health  
kubectl exec -n openwebui deploy/openwebui -- curl -sf http://localhost:8080/health

Positive Highlights

✅ Excellent Documentation

  • Comprehensive README with quick start, troubleshooting, and cleanup
  • Detailed PHASE1.md guide (359 lines) with step-by-step instructions
  • Well-planned PHASE2.md (428 lines) addressing production concerns
  • Clear distinction between dev (Phase 1) and production (Phase 2)

✅ Good Kustomize Structure

  • Clean separation of base and overlays
  • Phase-based approach allows gradual migration
  • Follows Kubernetes/Kustomize best practices

✅ Developer-Friendly Makefile

  • Clear targets with help text
  • Logical grouping (deploy, status, logs, test, clean)
  • Works with both Docker and Podman

✅ Proper Use of ConfigMaps

  • LiteLLM config externalized in ConfigMap
  • Easy to add new models without rebuilding images

✅ Resource Requests Set

  • Both deployments have CPU/memory requests
  • Enables proper scheduling

✅ Health Probes Configured

  • Liveness and readiness probes on both deployments
  • Appropriate initial delays and intervals

✅ Service Accounts Created

  • Dedicated ServiceAccounts for litellm and openwebui
  • Follows RBAC best practices (even though minimal permissions for Phase 1)

✅ Clear Phase 1 vs Phase 2 Separation

  • Explicitly documents dev-only nature of Phase 1
  • Acknowledges security limitations upfront
  • Provides migration path to production

Recommendations

Before Merging

  1. Add SecurityContext to deployments (Critical Outcome: Reduce Refinement Time with agent System #1)
  2. Fix LiteLLM health probe paths (Critical Epic: AI Agent Development #4) - verify with LiteLLM docs
  3. Add prominent warning in README about auth being disabled:
    ## ⚠️ SECURITY WARNING
    
    **Phase 1 is for LOCAL DEVELOPMENT ONLY.**
    
    - No authentication (WEBUI_AUTH=false)
    - Hardcoded development keys
    - No network policies
    - Do NOT deploy to shared/production clusters
    
    For production deployment, use Phase 2 (see docs/PHASE2.md).

For Phase 2

  1. Implement all OAuth/security features outlined in PHASE2.md
  2. Add NetworkPolicies for namespace isolation
  3. Pin image versions to specific tags/digests
  4. Set OwnerReferences on secrets
  5. Add automated integration tests

Documentation Improvements

  1. Fix README "coming soon" reference to PHASE2.md
  2. Update Makefile help text with correct Kind setup command
  3. Align resource limits in code with commit message (or vice versa)

Architecture Review

Alignment with ACP

Concern: This component is isolated from the main Ambient Code Platform:

  • Separate namespace (openwebui vs ambient-code)
  • No integration with ACP's AgenticSession CRs
  • Different authentication model (none vs user tokens)

Question for Discussion:

  1. Is this intended to eventually integrate with ACP's backend, or remain standalone?
  2. Should Open WebUI create AgenticSessions via ACP API instead of direct Anthropic calls?
  3. Should this use ACP's existing OAuth infrastructure instead of separate OAuth2-proxy?

From PR description: "prototype UX for using openwebui to interact with the new amber codebase agent added in #337"

  • If Amber integration is a goal, consider Phase 2 Claude Service should use ACP's backend API, not direct Anthropic calls
  • This would enable RBAC, audit logging, and session tracking

Colocating Documentation

From CLAUDE.md (lines 1070-1076):

- **Prefer inline updates**: Improve existing markdown files or code comments
- **Colocate new docs**: When feasible, documentation should live in the subdirectory 
  that has the relevant code (e.g., components/backend/README.md) not at the top level
- **Avoid top-level proliferation**: Only create top-level docs for cross-cutting concerns

Current structure: Excellent - README + docs/ within components/open-webui-llm/ follows this guidance.


Testing Coverage

What's Missing:

  • No automated deployment tests
  • No validation that ingress routes work
  • No tests for LiteLLM → Anthropic API connectivity

Recommendation: Add to e2e/ directory in Phase 2 (similar to existing vTeam e2e tests).


Final Assessment

Code Quality: 7/10

  • Well-structured, clear documentation
  • Missing SecurityContext and some K8s best practices
  • Good separation of concerns

Security: 3/10 (Phase 1) → Expected 8/10 (Phase 2)

  • Phase 1 explicitly dev-only, acceptable for local testing
  • Phase 2 plan addresses all major concerns
  • Must not deploy Phase 1 to shared environments

Architecture: 8/10

  • Clean design, good use of Kustomize
  • Unclear integration story with main ACP
  • Phase-based approach is pragmatic

Documentation: 9/10

  • Comprehensive and well-organized
  • Minor inconsistencies between docs and code

Recommendation: ✅ Approve with required changes

Merge after addressing:

  1. Add SecurityContext to deployments
  2. Fix LiteLLM health probe paths
  3. Add security warning to README

This is good work for a Phase 1 prototype. The clear documentation of limitations and Phase 2 migration plan demonstrates good engineering judgment.


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants