server: OpenAI API and llama.cpp Web UI compatibility by kirav5 · Pull Request #121 · tilesprivacy/tiles

kirav5 · 2026-04-05T20:20:12Z

Pull request: OpenAI-compatible HTTP API + llama.cpp Web UI hooks

Summary

Extends the Tiles Python inference server so OpenAI-style clients and the ggml-org llama.cpp SvelteKit Web UI can talk to Tiles (MLX backend) over HTTP, while keeping the existing Tiles CLI flow (POST /start, memory-mode chat) intact.

Motivation

Web UI: Developers want to use the stock llama.cpp Web UI against Tiles without running llama-server; the UI expects GET /props, enriched /v1/models rows, optional POST /models/load, OpenAI chat SSE, and timings on the final stream chunk.
CLI parity: Session survives server restarts where appropriate; streaming responses still expose metrics for the Rust CLI’s memory-mode benchmark line.
Operations: Health endpoints and predictable error bodies for /v1/* improve scripting and UI error dialogs.

What changed

API (`server/api.py`, `server/main.py`)

Lifespan: On startup, restore last model from ~/.tiles/server_session.json if present; optional TILES_BOOTSTRAP_MODEL + TILES_BOOTSTRAP_MODEL_CACHE_PATH (and optional memory/system prompt env) to load without a prior /start.
POST /start: After successful load, persist session; shared _apply_loaded_session used by /start, restore, bootstrap, and /models/load.
POST /models/load: For the Web UI’s {"model": "<id>"} body; MLX directory from TILES_MODEL_CACHE_PATH or TILES_BOOTSTRAP_MODEL_CACHE_PATH.
GET /props, GET /v1/models, GET /health / GET /v1/health, router stubs POST /models/load (implemented) / POST /models/unload (stub).
POST /v1/chat/completions: Resolved model, optional input memory path, OpenAI-shaped 422/400/500 for /v1/*, CORS for browser dev.
Middleware: Request body is not read in logging middleware (avoids consuming stream and logging prompts).

Compatibility helpers

server/llamacpp_webui_compat.py: /props-shaped JSON and /v1/models entries with path, status, in_cache expected by the Web UI.

Session persistence

server/session_persist.py: JSON file under ~/.tiles/server_session.json (override with TILES_SESSION_FILE); disabled in tests via TILES_SKIP_SESSION_PERSIST=1. Atomic write (temp + replace).

Streaming / metrics (`server/backend/mlx.py`)

Final SSE chunk and non-streaming completion include:
- timings (predicted_n, predicted_ms) for llama Web UI token/s UI.
- metrics (ttft_ms, total_tokens, tokens_per_second, total_latency_s) for Tiles CLI memory mode.

Schemas (`server/schemas.py`)

ChatCompletionRequest / ChatMessage adjustments for Web UI (optional model, multimodal content list, extra="ignore", max_tokens: -1 → unlimited).
RouterModelsLoadBody for /models/load.

Tests

server/tests/test_openai_compat.py, server/tests/conftest.py: Cover health, props, models, chat stream, errors, /models/load with env path.
just test-server: uv run --project server --with pytest pytest server/tests/ so pytest is not added to uv.lock (keeps lockfile identical to main).

Dev script

scripts/phase2_llamacpp_webui.sh + just webui-llamacpp: Clone/patch llama.cpp Web UI Vite proxy to TILES_BACKEND (default http://127.0.0.1:6969), npm run dev.

How to test

cd /path/to/tiles
just test-server
just serve   # terminal 1
just webui-llamacpp   # terminal 2 — load model per Tiles docs (POST /start, session, bootstrap, or TILES_MODEL_CACHE_PATH + UI load)

Configuration reference

Variable	Purpose
`TILES_SESSION_FILE`	Override path for session JSON
`TILES_SKIP_SESSION_PERSIST=1`	Disable read/write (tests)
`TILES_BOOTSTRAP_MODEL` / `TILES_BOOTSTRAP_MODEL_CACHE_PATH`	Auto-load on startup if no session
`TILES_BOOTSTRAP_MEMORY_PATH` / `TILES_BOOTSTRAP_SYSTEM_PROMPT`	Optional bootstrap extras
`TILES_MODEL_CACHE_PATH`	MLX directory for Web UI `POST /models/load`

- OpenAI-style /v1/models, /v1/chat/completions (SSE), error bodies, CORS - GET /props and enriched model rows for ggml Web UI - Session file persistence and optional TILES_BOOTSTRAP_* startup load - POST /models/load when TILES_MODEL_CACHE_PATH is set (Web UI load) - Chat SSE: timings (Web UI) and metrics (Tiles CLI memory mode) - Pytest coverage; TILES_SKIP_SESSION_PERSIST in tests - just webui-llamacpp via scripts/phase2_llamacpp_webui.sh Made-with: Cursor

coderabbitai · 2026-04-05T20:20:25Z

Warning

Rate limit exceeded

@kirav5 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 53 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 16 minutes and 53 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 39e45110-734d-49ac-b7a0-b7f1310e7f0b

📥 Commits

Reviewing files that changed from the base of the PR and between 48e11be and 7920c33.

📒 Files selected for processing (10)

.gitmodules
HACKING.md
scripts/phase2_llamacpp_webui.sh
server/api.py
server/llamacpp_webui_compat.py
server/schemas.py
server/tests/conftest.py
server/tests/test_llamacpp_webui_compat.py
server/tests/test_openai_compat.py
third_party/llama.cpp

📝 Walkthrough

Walkthrough

The PR introduces OpenAI-compatible API endpoints and llama.cpp Web UI integration to the Tiles server. Key additions include session persistence (saving/loading model state across restarts), new endpoints for health checks (/health, /v1/health), model management (/models/load, /models/unload), and model listing (/v1/models), alongside a web UI launch script. The server now supports both memory-agent mode (with legacy input field handling) and OpenAI-compatible chat completions with proper Server-Sent Events streaming. Backend enhancements add metrics formatting and non-streaming completion support. Schema changes relax model requirements and support multimodal content arrays. Tests validate the new OpenAI-compatible surface.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as Server API
    participant SessionStore as Session Persist
    participant Backend
    participant Cache as Model Cache

    Client->>API: POST /v1/chat/completions<br/>(with/without model, messages, input)
    
    alt No model in request
        API->>API: Use _loaded_model_id
        alt No loaded model
            API-->>Client: HTTP 400<br/>(require /start or model param)
        end
    end
    
    alt input field present (memory-agent mode)
        API->>Backend: generate_chat_stream<br/>(messages with tool result appended)
    else input field absent (OpenAI-compat mode)
        API->>API: _openai_compat_effective_messages<br/>(flatten content arrays,<br/>optionally prepend persisted<br/>system prompt)
        alt stream=true
            API->>Backend: generate_chat_stream<br/>(effective messages)
            Backend-->>API: SSE stream chunks
        else stream=false
            API->>Backend: complete_openai_chat_completion<br/>(effective messages)
            Backend-->>API: chat.completion object
        end
    end
    
    Backend-->>API: completion response<br/>(with metrics/timings)
    API->>SessionStore: save(model, cache_path,<br/>memory_path, system_prompt)
    API-->>Client: SSE stream or JSON response

sequenceDiagram
    participant Startup as Server Startup
    participant SessionStore as Session Persist
    participant API as API Lifespan
    participant EnvVars as Environment
    participant Backend

    Startup->>API: FastAPI(lifespan=_lifespan)
    activate API
    API->>SessionStore: load()
    
    alt Session file exists & valid
        SessionStore-->>API: SessionData<br/>(model, cache_path,<br/>memory_path, system_prompt)
        API->>API: _apply_loaded_session<br/>(restore from persisted data)
    else Session file missing/invalid/<br/>persistence disabled
        API->>EnvVars: read TILES_BOOTSTRAP_*<br/>environment variables
        API->>Backend: get_or_load_model<br/>(from TILES_MODEL_CACHE_PATH)
        Backend-->>API: loaded model
        API->>API: _apply_loaded_session<br/>(bootstrap mode)
    end
    
    API->>API: Record global state<br/>(_loaded_model_id,<br/>_loaded_model_cache_path,<br/>_model_loaded_at)
    deactivate API

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Added token metrics with the model response #80: Directly related as both PRs modify server/backend/mlx.py to emit and format GenerationMetrics in streaming responses and final chunks.
fix: updated mlx related pkgs + end condition for stopping model request loop #111: Related through overlapping changes to server/backend/mlx.py and server/api.py model-load/serve logic and runtime backend integration.
Bundle models for the offline installer #101: Related through code-level changes to model cache/path handling and the backend's get_or_load_model signature and usage in model loading flows.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'server: OpenAI API and llama.cpp Web UI compatibility' clearly and concisely summarizes the main objective of the PR—adding OpenAI API compatibility and llama.cpp Web UI support to the server.
Description check	✅ Passed	The description thoroughly explains the motivations, changes, and testing approach, directly relating to the changeset across multiple files including API endpoints, session persistence, compatibility helpers, schemas, tests, and dev tooling.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 11

🧹 Nitpick comments (1)

server/session_persist.py (1)

38-40: Use a unique temp file name to avoid cross-writer collisions.

Using a fixed *.tmp path can race if two saves happen concurrently (or multiple processes share the same session file). A unique temp file per write makes this robust.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@server/session_persist.py` around lines 38 - 40, The current atomic write
uses a fixed tmp path (tmp = p.parent / f"{p.name}.tmp") which can collide
across concurrent writers; change it to create a unique temp filename in the
same directory (e.g. use tempfile.NamedTemporaryFile(delete=False, dir=p.parent,
prefix=p.name + ".", suffix=".tmp") or generate a UUID via uuid.uuid4().hex and
build tmp = p.parent / f"{p.name}.{uuid}.tmp"), write the text to that unique
tmp, close it, then call os.replace(tmp, p) as before; ensure you add the
appropriate import (tempfile or uuid) and clean up the temp file on error.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/phase2_llamacpp_webui.sh`:
- Around line 28-32: After running the in-place perl replacement (perl -pi -e)
to swap http://localhost:8080 for ${BACKEND} in the file referenced by VITE_CFG,
immediately verify the substitution succeeded by searching the updated file for
the expected BACKEND string (e.g., grep -qF "$BACKEND" "$VITE_CFG"); if the
check fails, restore the backup ("${VITE_CFG}.tiles.bak"), print a clear error
mentioning BACKEND and VITE_CFG, and exit non‑zero so the script fails fast
instead of silently continuing.

In `@server/api.py`:
- Around line 162-168: The CORS middleware is currently wide open via
app.add_middleware(CORSMiddleware, allow_origins=["*"], ...) — change it to be
opt-in by reading an environment variable or dev flag (e.g., CORS_ALLOWLIST or
ENABLE_DEV_CORS) and use that to populate allow_origins instead of "*" (parse a
comma-separated allowlist or treat a dev boolean as enabling "*"); keep a safe
default of an empty list or no CORS when the env/flag is unset, and ensure
app.add_middleware(...) uses the computed allow_origins value (and document the
env var/flag).
- Around line 195-212: In the models_load async route, avoid calling the
blocking _apply_loaded_session(mid, cache, MEMORY_PATH, sysp) directly; instead
call it via asyncio.to_thread and await the result (e.g., await
asyncio.to_thread(_apply_loaded_session, mid, cache, MEMORY_PATH, sysp)) so the
synchronous runtime.backend.get_or_load_model call runs off the event loop and
doesn't block other requests; update the models_load handler to import/ use
asyncio accordingly and preserve existing error handling/return values after
awaiting the threaded call.
- Around line 78-80: The early return in _restore_session_on_startup caused by
session_persist.skip_persist() prevents the TILES_BOOTSTRAP_* env handling from
running; change the function so bootstrap env processing runs regardless of
skip_persist (either move the TILES_BOOTSTRAP_* branch before the return or
check for those env vars and execute bootstrap logic even when
session_persist.skip_persist() is true), referencing _restore_session_on_startup
and session_persist.skip_persist and ensuring any bootstrap code that reads
TILES_BOOTSTRAP_* is executed unconditionally.
- Around line 260-273: The session persistence write (session_persist.save) is
currently causing a 500 when it fails even though _apply_loaded_session has
already made the model live; wrap the call to session_persist.save in a
try/except/try-catch so persistence errors are handled as best-effort (log the
exception and continue) rather than re-raising, and apply the same pattern for
the analogous save in the /models/load path; ensure you use the module logger
(or session_persist logger) to record the error and do not change the successful
response path when save fails.

In `@server/backend/mlx.py`:
- Around line 233-244: The non-streaming completion can return stop text and
misses sequences spanning token boundaries; after joining parts into text
(variable parts -> text) and before constructing out (out: dict[...] = {...}),
normalize request.stop into stop_sequences (list), search for the earliest
occurrence of any stop sequence in text, truncate text up to that index (text =
text[:idx]) if found, and use this trimmed text when building the assistant
content in out; also remove the per-token containment check that attempted to
stop on individual tokens and ensure the same stop trimming logic is applied to
any returned fields derived from text.

In `@server/llamacpp_webui_compat.py`:
- Around line 76-87: The code in _read_n_ctx assumes cfg is a mapping and calls
cfg.get(...) which raises AttributeError for valid JSON that is not an object;
guard against that by verifying cfg is a mapping (e.g., isinstance(cfg, dict) or
collections.abc.Mapping) before iterating the keys and calling .get on it, and
if it's not a mapping skip to the existing fallback/error handling so malformed
but syntactically-valid JSON cannot crash the function.

In `@server/schemas.py`:
- Around line 84-90: normalize_max_tokens currently only maps -1 to None and
lets other negative values pass through; update the validator (the
`@field_validator`("max_tokens", mode="before") method normalize_max_tokens) to
explicitly reject any negative value other than -1 by raising a ValueError
(e.g., "max_tokens must be -1 or a non-negative integer") so Pydantic will
return a 422 instead of forwarding malformed negatives to the backend.

In `@server/session_persist.py`:
- Around line 39-40: The temporary file created for session persistence is
written with default permissions (tmp.write_text) and then atomically replaced
(os.replace(p)), risking group/world read access; after writing the content to
the temp Path (the variable tmp used with tmp.write_text) but before calling
os.replace(tmp, p), set restrictive owner-only permissions (e.g.,
tmp.chmod(0o600)) so the file is readable/writable only by the owner and then
perform the os.replace; this preserves atomic replacement while ensuring the
persisted session (including system_prompt) is not group/world-readable.

In `@server/tests/conftest.py`:
- Around line 17-27: The reset_api_state fixture currently leaves
api._current_model_path and api._default_max_tokens unchanged, risking
cross-test leaks; update the fixture (the setup/teardown surrounding yield in
reset_api_state) to also set api._current_model_path = None (or empty string per
API convention) and api._default_max_tokens = <default_value> (use the default
from server/api.py) both before yielding and after yield so the global state is
fully cleared between tests.

In `@server/tests/test_openai_compat.py`:
- Around line 8-10: test_v1_models_empty (and other tests) construct
TestClient(app) directly which bypasses FastAPI lifespan hooks; change
instantiation to use a context manager (with TestClient(app) as client:) so the
app's startup/shutdown handlers run and session restore/bootstrap code exercise;
update tests referencing TestClient(app) (e.g., test_v1_models_empty) to the
context-managed form or refactor into a shared fixture that yields a
context-managed client.

---

Nitpick comments:
In `@server/session_persist.py`:
- Around line 38-40: The current atomic write uses a fixed tmp path (tmp =
p.parent / f"{p.name}.tmp") which can collide across concurrent writers; change
it to create a unique temp filename in the same directory (e.g. use
tempfile.NamedTemporaryFile(delete=False, dir=p.parent, prefix=p.name + ".",
suffix=".tmp") or generate a UUID via uuid.uuid4().hex and build tmp = p.parent
/ f"{p.name}.{uuid}.tmp"), write the text to that unique tmp, close it, then
call os.replace(tmp, p) as before; ensure you add the appropriate import
(tempfile or uuid) and clean up the temp file on error.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 3569692a-63c8-4a06-9841-a8ae4ce55464

📥 Commits

Reviewing files that changed from the base of the PR and between 1b8be89 and 48e11be.

⛔ Files ignored due to path filters (1)

server/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (11)

justfile
scripts/phase2_llamacpp_webui.sh
server/api.py
server/backend/mlx.py
server/llamacpp_webui_compat.py
server/main.py
server/pyproject.toml
server/schemas.py
server/session_persist.py
server/tests/conftest.py
server/tests/test_openai_compat.py

coderabbitai · 2026-04-05T20:30:29Z

+if ! grep -qF "$BACKEND" "$VITE_CFG" 2>/dev/null; then
+  echo "Patching $VITE_CFG to proxy API routes to $BACKEND"
+  cp "$VITE_CFG" "${VITE_CFG}.tiles.bak"
+  perl -pi -e "s|http://localhost:8080|${BACKEND}|g" "$VITE_CFG"
+fi


⚠️ Potential issue | 🟡 Minor

Backend reconfiguration can silently fail after the first patch.

If vite.config.ts already contains a non-default backend URL, Line 31 won’t match http://localhost:8080, so switching TILES_BACKEND later may no-op while still printing “Patching…”. Add a post-patch verification (and fail fast) so misconfiguration is visible.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/phase2_llamacpp_webui.sh` around lines 28 - 32, After running the in-place perl replacement (perl -pi -e) to swap http://localhost:8080 for ${BACKEND} in the file referenced by VITE_CFG, immediately verify the substitution succeeded by searching the updated file for the expected BACKEND string (e.g., grep -qF "$BACKEND" "$VITE_CFG"); if the check fails, restore the backup ("${VITE_CFG}.tiles.bak"), print a clear error mentioning BACKEND and VITE_CFG, and exit non‑zero so the script fails fast instead of silently continuing.

coderabbitai · 2026-04-05T20:30:29Z

+async def _restore_session_on_startup() -> None:
+    if session_persist.skip_persist():
+        return


⚠️ Potential issue | 🟠 Major

Don't let TILES_SKIP_SESSION_PERSIST disable env bootstrap.

This early return skips the TILES_BOOTSTRAP_* branch entirely. In a stateless/test setup that intentionally disables session file I/O, the server will no longer auto-load the bootstrap model even when the bootstrap env vars are present.

🐛 Proposed fix

async def _restore_session_on_startup() -> None: - if session_persist.skip_persist(): - return - - data = session_persist.load() + data = None if session_persist.skip_persist() else session_persist.load()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@server/api.py` around lines 78 - 80, The early return in _restore_session_on_startup caused by session_persist.skip_persist() prevents the TILES_BOOTSTRAP_* env handling from running; change the function so bootstrap env processing runs regardless of skip_persist (either move the TILES_BOOTSTRAP_* branch before the return or check for those env vars and execute bootstrap logic even when session_persist.skip_persist() is true), referencing _restore_session_on_startup and session_persist.skip_persist and ensuring any bootstrap code that reads TILES_BOOTSTRAP_* is executed unconditionally.

coderabbitai · 2026-04-05T20:30:30Z

+    tmp.write_text(text, encoding="utf-8")
+    os.replace(tmp, p)


⚠️ Potential issue | 🟠 Major

Persisted session file can be created with overly broad read permissions.

Line 39 writes sensitive session content (including system_prompt) using default umask-derived permissions. On common systems this can become group/world-readable. Restrict to owner-only before the replace.

🔒 Proposed fix

tmp = p.parent / f"{p.name}.tmp" tmp.write_text(text, encoding="utf-8") + os.chmod(tmp, 0o600) os.replace(tmp, p)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tmp.write_text(text, encoding="utf-8")

os.replace(tmp, p)

tmp = p.parent / f"{p.name}.tmp"

tmp.write_text(text, encoding="utf-8")

os.chmod(tmp, 0o600)

os.replace(tmp, p)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@server/session_persist.py` around lines 39 - 40, The temporary file created for session persistence is written with default permissions (tmp.write_text) and then atomically replaced (os.replace(p)), risking group/world read access; after writing the content to the temp Path (the variable tmp used with tmp.write_text) but before calling os.replace(tmp, p), set restrictive owner-only permissions (e.g., tmp.chmod(0o600)) so the file is readable/writable only by the owner and then perform the os.replace; this preserves atomic replacement while ensuring the persisted session (including system_prompt) is not group/world-readable.

coderabbitai · 2026-04-05T20:30:30Z

+def test_v1_models_empty():
+    client = TestClient(app)
+    r = client.get("/v1/models")


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In FastAPI/Starlette, does TestClient(app)run lifespan startup/shutdown handlers automatically, or iswith TestClient(app) as client: required?

💡 Result:

In FastAPI/Starlette, TestClient(app) alone does not automatically run lifespan startup/shutdown handlers. The with TestClient(app) as client: context manager is required to trigger the lifespan handlers. Instantiating TestClient(app) without the with block skips the lifespan execution, while entering the with block runs startup code and exiting runs shutdown code.

Citations:

1: https://fastapi.tiangolo.com/advanced/testing-events/

2: https://starlette.dev/testclient/

3: https://starlette.dev/lifespan/

4: Mention the difference between with TestClient(app) as client and client = TestClient(app) Kludex/starlette#1733

Use context manager to exercise the lifespan startup/shutdown.

These tests instantiate TestClient(app) directly, which skips the FastAPI(lifespan=...) startup and shutdown handlers that the PR added. Tests should use with TestClient(app) as client: to properly exercise session restore and bootstrap behavior. Consider refactoring to a shared fixture.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@server/tests/test_openai_compat.py` around lines 8 - 10, test_v1_models_empty (and other tests) construct TestClient(app) directly which bypasses FastAPI lifespan hooks; change instantiation to use a context manager (with TestClient(app) as client:) so the app's startup/shutdown handlers run and session restore/bootstrap code exercise; update tests referencing TestClient(app) (e.g., test_v1_models_empty) to the context-managed form or refactor into a shared fixture that yields a context-managed client.

codecov · 2026-04-05T20:34:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

- Gate CORS behind TILES_CORS_ORIGINS or TILES_CORS_ALLOW_ALL=1 - Run model load via asyncio.to_thread on /start and /models/load - Best-effort session_persist.save after successful load (OSError logged) - Harden _read_n_ctx for non-dict config.json; reject max_tokens < 0 except -1 - Reset _current_model_path and _default_max_tokens in test fixture - phase2: resolve llama.cpp via LLAMA_CPP_ROOT, third_party, or sibling clone - HACKING: optional submodule + Web UI section; add webui compat tests Made-with: Cursor

…:8080 Made-with: Cursor

coderabbitai Bot reviewed Apr 5, 2026

View reviewed changes

feynon added 3 commits April 6, 2026 02:25

Add llama.cpp as submodule for optional Web UI

f360aec

scripts: clarify llama.cpp path, guard submodule dir, patch 127.0.0.1…

7920c33

…:8080 Made-with: Cursor

Conversation

kirav5 commented Apr 5, 2026

Pull request: OpenAI-compatible HTTP API + llama.cpp Web UI hooks

Summary

Motivation

What changed

API (server/api.py, server/main.py)

Compatibility helpers

Session persistence

Streaming / metrics (server/backend/mlx.py)

Schemas (server/schemas.py)

Tests

Dev script

How to test

Configuration reference

Uh oh!

coderabbitai Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 5, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

API (`server/api.py`, `server/main.py`)

Streaming / metrics (`server/backend/mlx.py`)

Schemas (`server/schemas.py`)

coderabbitai Bot commented Apr 5, 2026 •

edited

Loading