Conductor is a platform for managing models, model runners, model configurations, and virtualizing combinations into virtual model runners exposed to the network through OpenAI, vLLM, Gemini, and Ollama APIs.
- Multi-tenant Architecture: Full tenant isolation with tenant-scoped data access
- Model Runner Endpoints: Define and manage first-class endpoint types for OpenAI, vLLM, Gemini, and Ollama model runners
- Model Definitions: Catalog your models with metadata like family, parameter size, and quantization
- Model Configurations: Create reusable configurations with pinned properties for embeddings and completions
- Virtual Model Runners: Combine endpoints and configurations into virtual endpoints with load balancing
- Configuration Pinning: Automatically inject model parameters into requests (like OllamaFlow)
- Session Affinity: Pin clients to specific backend endpoints based on IP address, API key, or custom headers to minimize context drops and model swapping
- Load Balancing: Round-robin, random, or first-available endpoint selection with weighted distribution and optional session affinity
- Health Checking: Automatic background health monitoring of endpoints with configurable thresholds
- Rate Limiting: Per-endpoint maximum parallel request limits with automatic capacity management
- Request History: Optional per-VMR request/response capture for debugging and auditing with configurable retention
- React Dashboard: Full-featured UI for managing all entities including real-time health status
cd docker
docker compose up -dThe server will be available at http://localhost:9000 and the dashboard at http://localhost:9100.
The Compose file builds the server and dashboard from the local repository Dockerfiles.
- .NET 10 SDK
- Node.js 20+
cd src/Conductor.Server
dotnet runcd dashboard
npm install
npm run devConductor’s automated tests use Touchstone so the same shared test cases can run through multiple hosts.
src/Test.Shared/contains the authoritative test definitions.src/Test.Xunit/exposes the shared suite through xUnit.src/Test.Nunit/exposes the same suite through NUnit.src/Test.Automated/runs the suite through the Touchstone console runner.
Common commands:
# Run framework-hosted tests
dotnet test src/Conductor.sln
# Run the console host
dotnet run --project src/Test.Automated/Test.Automated.csprojSee TESTING.md for the full testing guide.
Conductor currently supports four model runner provider types in both the backend proxy and the dashboard:
| Provider Type | Runner Type in UI | Proxied API Shape | Notes |
|---|---|---|---|
| OpenAI | OpenAI |
OpenAI REST API | Supports OpenAI-style chat, embeddings, and model listing |
| vLLM | vLLM |
OpenAI-compatible REST API | First-class runner type in the UI; uses the OpenAI-compatible API surface |
| Gemini | Gemini |
Gemini REST API | Supports Gemini-style models/{model}:generateContent, streaming, embeddings, and model listing |
| Ollama | Ollama |
Ollama REST API | Supports Ollama-style /api/generate, /api/chat, and embeddings flows |
Conductor supports two authentication methods:
- Header-based: Include
x-tenant-id,x-email, andx-passwordheaders - Bearer Token: Include
Authorization: Bearer {token}header
Users have three permission levels:
| Permission | Description |
|---|---|
Global Admin (IsAdmin=true) |
Full cross-tenant access to all resources |
Tenant Admin (IsTenantAdmin=true) |
Can manage users and credentials within their own tenant |
| Standard User | Can only access model configurations, endpoints, runners, and virtual runners in their tenant |
- Global Admins can operate on any tenant by specifying
TenantIdin their requests - Tenant Admins have elevated privileges within their assigned tenant
- Standard Users have read/write access to non-administrative resources
| Entity | Prefix | API Endpoint |
|---|---|---|
| Administrator | admin_ |
/v1.0/administrators |
| Tenant | ten_ |
/v1.0/tenants |
| User | usr_ |
/v1.0/users |
| Credential | cred_ |
/v1.0/credentials |
| Model Runner Endpoint | mre_ |
/v1.0/modelrunnerendpoints |
| Model Definition | md_ |
/v1.0/modeldefinitions |
| Model Configuration | mc_ |
/v1.0/modelconfigurations |
| Virtual Model Runner | vmr_ |
/v1.0/virtualmodelrunners |
| Request History | req_ |
/v1.0/requesthistory |
| Request History Summary | - | /v1.0/requesthistory/summary |
Virtual model runners expose an API at their configured base path. For example, a VMR with base path /v1.0/api/my-vmr/ would expose:
- OpenAI API:
/v1.0/api/my-vmr/v1/chat/completions,/v1.0/api/my-vmr/v1/embeddings - vLLM API:
/v1.0/api/my-vmr/v1/chat/completions,/v1.0/api/my-vmr/v1/embeddings - Gemini API:
/v1.0/api/my-vmr/v1beta/models/gemini-2.5-flash:generateContent,/v1.0/api/my-vmr/v1beta/models/text-embedding-004:embedContent - Ollama API:
/v1.0/api/my-vmr/api/generate,/v1.0/api/my-vmr/api/chat
{
"Webserver": {
"Hostname": "localhost",
"Port": 9000,
"Ssl": false,
"Cors": {
"Enabled": false,
"AllowedOrigins": [],
"AllowedMethods": ["GET", "POST", "PUT", "DELETE", "OPTIONS"],
"AllowedHeaders": ["Content-Type", "Authorization"],
"ExposedHeaders": [],
"AllowCredentials": false,
"MaxAgeSeconds": 86400
}
},
"Database": {
"Type": "Sqlite",
"Filename": "./conductor.db"
},
"Logging": {
"Servers": [],
"LogDirectory": "./logs/",
"LogFilename": "conductor.log",
"ConsoleLogging": true,
"MinimumSeverity": 0
},
"RequestHistory": {
"Enabled": true,
"Directory": "./request-history/",
"RetentionDays": 7,
"CleanupIntervalMinutes": 60,
"MaxRequestBodyBytes": 65536,
"MaxResponseBodyBytes": 65536
}
}- SQLite (default):
"Type": "Sqlite", "Filename": "./conductor.db" - PostgreSQL:
"Type": "PostgreSql", "ConnectionString": "Host=..." - SQL Server:
"Type": "SqlServer", "ConnectionString": "Server=..." - MySQL:
"Type": "MySql", "ConnectionString": "Server=..."
Cross-Origin Resource Sharing (CORS) can be enabled to allow browser-based applications to access the Conductor API.
| Property | Type | Default | Description |
|---|---|---|---|
Enabled |
bool | false |
Enable or disable CORS support |
AllowedOrigins |
string[] | [] |
List of allowed origins. Use ["*"] for all origins |
AllowedMethods |
string[] | ["GET", "POST", "PUT", "DELETE", "OPTIONS"] |
Allowed HTTP methods |
AllowedHeaders |
string[] | ["Content-Type", "Authorization", ...] |
Allowed request headers |
ExposedHeaders |
string[] | [] |
Headers exposed to the browser |
AllowCredentials |
bool | false |
Allow credentials (cookies, auth headers). Cannot be used with AllowedOrigins: ["*"] |
MaxAgeSeconds |
int | 86400 |
Preflight cache duration (0-86400 seconds) |
Example: Allow all origins (development)
{
"Webserver": {
"Cors": {
"Enabled": true,
"AllowedOrigins": ["*"]
}
}
}Example: Restrict to specific origins (production)
{
"Webserver": {
"Cors": {
"Enabled": true,
"AllowedOrigins": ["https://app.example.com", "https://admin.example.com"],
"AllowCredentials": true
}
}
}Request history captures request/response data for Virtual Model Runners with RequestHistoryEnabled set to true. This is useful for debugging, auditing, troubleshooting, and latency analysis. Each completed entry records total response time and time to first token/byte (FirstTokenTimeMs). For non-streaming responses, FirstTokenTimeMs is set to the same value as ResponseTimeMs.
| Property | Type | Default | Description |
|---|---|---|---|
Enabled |
bool | true |
Enable or disable request history globally |
Directory |
string | "./request-history/" |
Directory for storing request detail JSON files |
RetentionDays |
int | 30 |
Number of days to retain entries before cleanup (1-365) |
CleanupIntervalMinutes |
int | 60 |
Interval between cleanup runs in minutes (1-1440) |
MaxRequestBodyBytes |
int | 65536 |
Maximum request body bytes to capture (1-10485760) |
MaxResponseBodyBytes |
int | 65536 |
Maximum response body bytes to capture (1-10485760) |
Note: Request history must be enabled both globally (in conductor.json) and per-VMR (via the RequestHistoryEnabled property).
Captured request history entries include the VMR, routed model runner endpoint, matched model definition, matched model configuration, HTTP status, body lengths, transfer type, total response time (ResponseTimeMs), and time to first token/byte (FirstTokenTimeMs).
The summary endpoint returns aggregated request counts grouped by time buckets, useful for charting request volume and success/failure rates over time.
GET /v1.0/requesthistory/summary?startUtc={ISO8601}&endUtc={ISO8601}&interval={hour|day}&vmrGuid={guid}
| Parameter | Type | Required | Description |
|---|---|---|---|
startUtc |
string | No | Start of time range (UTC, ISO 8601). Default: 1 hour ago |
endUtc |
string | No | End of time range (UTC, ISO 8601). Default: now |
interval |
string | No | Bucket interval: minute, 15minute, hour, 6hour, or day. Default: hour |
vmrGuid |
string | No | Filter by Virtual Model Runner GUID |
Response:
{
"Data": [
{
"TimestampUtc": "2026-03-20T10:00:00Z",
"SuccessCount": 42,
"FailureCount": 3,
"TotalCount": 45
}
],
"StartUtc": "2026-03-20T10:00:00Z",
"EndUtc": "2026-03-20T11:00:00Z",
"Interval": "hour",
"TotalSuccess": 42,
"TotalFailure": 3,
"TotalRequests": 45
}Success is defined as HTTP status 100-399; failure is HTTP status 400-599 or null (incomplete requests).
Model configurations can define pinned properties that are automatically merged into incoming requests:
{
"Name": "Low Temperature Config",
"PinnedCompletionsProperties": {
"temperature": 0.3,
"top_p": 0.9,
"max_tokens": 2048
},
"PinnedEmbeddingsProperties": {
"model": "text-embedding-ada-002"
}
}When a request comes through a virtual model runner, the pinned properties are merged with the request body, allowing you to enforce specific model parameters.
Model Runner Endpoints support comprehensive health checking with the following properties:
| Property | Type | Default | Description |
|---|---|---|---|
HealthCheckUrl |
string | / |
URL path appended to endpoint base URL for health checks |
HealthCheckMethod |
enum | GET |
HTTP method (GET or HEAD) |
HealthCheckIntervalMs |
int | 5000 |
Milliseconds between health checks |
HealthCheckTimeoutMs |
int | 5000 |
Timeout for health check requests |
HealthCheckExpectedStatusCode |
int | 200 |
Expected HTTP status code for healthy |
UnhealthyThreshold |
int | 2 |
Consecutive failures before marking unhealthy |
HealthyThreshold |
int | 2 |
Consecutive successes before marking healthy |
HealthCheckUseAuth |
bool | false |
Include API key (Bearer token) in health check requests |
MaxParallelRequests |
int | 4 |
Maximum concurrent requests (0 = unlimited) |
Weight |
int | 1 |
Relative weight for load balancing (1-1000) |
Note for OpenAI and vLLM APIs: When using api.openai.com or another OpenAI-compatible backend that requires authentication for model listing, set HealthCheckUseAuth to true and HealthCheckUrl to /v1/models.
Note for Gemini API: When using generativelanguage.googleapis.com, set HealthCheckUseAuth to true and HealthCheckUrl to /v1beta/models. Gemini uses the x-goog-api-key header rather than bearer token authentication.
- Endpoints start in an unhealthy state and transition to healthy after meeting the
HealthyThreshold - Background tasks continuously monitor each active endpoint at the configured interval
- The proxy automatically excludes unhealthy endpoints from request routing
- When all endpoints are unhealthy, requests return
502 Bad Gateway - When all endpoints are at capacity, requests return
429 Too Many Requests
- Each endpoint tracks in-flight requests in real-time
- The
MaxParallelRequestsproperty enforces a per-endpoint concurrency limit - Set to
0for unlimited concurrent requests - Requests are counted from start until the response completes (including streaming)
- The
Weightproperty influences endpoint selection in round-robin and random modes - Higher weight = more traffic directed to that endpoint
- Example: Endpoint A (weight=3) receives 3x more traffic than Endpoint B (weight=1)
Monitor endpoint health via the REST API:
# Health of all endpoints in tenant
GET /v1.0/modelrunnerendpoints/health
# Health of endpoints for a specific VMR
GET /v1.0/virtualmodelrunners/{id}/healthResponse includes:
- Current health state (healthy/unhealthy)
- In-flight request count
- Total uptime/downtime
- Uptime percentage
- Last check timestamp
- Last error message (if any)
The included Docker Compose setup uses local build contexts:
- Server:
src/Conductor.Server/Dockerfile - Dashboard:
dashboard/Dockerfile
# Build server
./build-server.sh # or build-server.bat on Windows
# Build dashboard
./build-dashboard.sh # or build-dashboard.bat on WindowsMIT License - see LICENSE.md for details.