Conductor

Conductor is a platform for managing models, model runners, model configurations, and virtualizing combinations into virtual model runners exposed to the network through OpenAI, vLLM, Gemini, and Ollama APIs.

Features

Multi-tenant Architecture: Full tenant isolation with tenant-scoped data access
Model Runner Endpoints: Define and manage first-class endpoint types for OpenAI, vLLM, Gemini, and Ollama model runners
Model Definitions: Catalog your models with metadata like family, parameter size, and quantization
Model Configurations: Create reusable configurations with pinned properties for embeddings and completions
Virtual Model Runners: Combine endpoints and configurations into virtual endpoints with load balancing
Configuration Pinning: Automatically inject model parameters into requests (like OllamaFlow)
Session Affinity: Pin clients to specific backend endpoints based on IP address, API key, or custom headers to minimize context drops and model swapping
Load Balancing: Round-robin, random, or first-available endpoint selection with weighted distribution and optional session affinity
Health Checking: Automatic background health monitoring of endpoints with configurable thresholds
Rate Limiting: Per-endpoint maximum parallel request limits with automatic capacity management
Request History: Optional per-VMR request/response capture for debugging and auditing with configurable retention
React Dashboard: Full-featured UI for managing all entities including real-time health status

Quick Start

Using Docker Compose

cd docker
docker compose up -d

The server will be available at http://localhost:9000 and the dashboard at http://localhost:9100. The Compose file builds the server and dashboard from the local repository Dockerfiles.

Building from Source

Prerequisites

.NET 10 SDK
Node.js 20+

Build and Run Server

cd src/Conductor.Server
dotnet run

Build and Run Dashboard

cd dashboard
npm install
npm run dev

Testing

Conductor’s automated tests use Touchstone so the same shared test cases can run through multiple hosts.

src/Test.Shared/ contains the authoritative test definitions.
src/Test.Xunit/ exposes the shared suite through xUnit.
src/Test.Nunit/ exposes the same suite through NUnit.
src/Test.Automated/ runs the suite through the Touchstone console runner.

Common commands:

# Run framework-hosted tests
dotnet test src/Conductor.sln

# Run the console host
dotnet run --project src/Test.Automated/Test.Automated.csproj

See TESTING.md for the full testing guide.

API Overview

Supported Provider Types

Conductor currently supports four model runner provider types in both the backend proxy and the dashboard:

Provider Type	Runner Type in UI	Proxied API Shape	Notes
OpenAI	`OpenAI`	OpenAI REST API	Supports OpenAI-style chat, embeddings, and model listing
vLLM	`vLLM`	OpenAI-compatible REST API	First-class runner type in the UI; uses the OpenAI-compatible API surface
Gemini	`Gemini`	Gemini REST API	Supports Gemini-style `models/{model}:generateContent`, streaming, embeddings, and model listing
Ollama	`Ollama`	Ollama REST API	Supports Ollama-style `/api/generate`, `/api/chat`, and embeddings flows

Authentication

Conductor supports two authentication methods:

Header-based: Include x-tenant-id, x-email, and x-password headers
Bearer Token: Include Authorization: Bearer {token} header

User Permission Model

Users have three permission levels:

Permission	Description
Global Admin (`IsAdmin=true`)	Full cross-tenant access to all resources
Tenant Admin (`IsTenantAdmin=true`)	Can manage users and credentials within their own tenant
Standard User	Can only access model configurations, endpoints, runners, and virtual runners in their tenant

Global Admins can operate on any tenant by specifying TenantId in their requests
Tenant Admins have elevated privileges within their assigned tenant
Standard Users have read/write access to non-administrative resources

Endpoints

Entity	Prefix	API Endpoint
Administrator	`admin_`	`/v1.0/administrators`
Tenant	`ten_`	`/v1.0/tenants`
User	`usr_`	`/v1.0/users`
Credential	`cred_`	`/v1.0/credentials`
Model Runner Endpoint	`mre_`	`/v1.0/modelrunnerendpoints`
Model Definition	`md_`	`/v1.0/modeldefinitions`
Model Configuration	`mc_`	`/v1.0/modelconfigurations`
Virtual Model Runner	`vmr_`	`/v1.0/virtualmodelrunners`
Request History	`req_`	`/v1.0/requesthistory`
Request History Summary	-	`/v1.0/requesthistory/summary`

Virtual Model Runner Proxy

Virtual model runners expose an API at their configured base path. For example, a VMR with base path /v1.0/api/my-vmr/ would expose:

OpenAI API: /v1.0/api/my-vmr/v1/chat/completions, /v1.0/api/my-vmr/v1/embeddings
vLLM API: /v1.0/api/my-vmr/v1/chat/completions, /v1.0/api/my-vmr/v1/embeddings
Gemini API: /v1.0/api/my-vmr/v1beta/models/gemini-2.5-flash:generateContent, /v1.0/api/my-vmr/v1beta/models/text-embedding-004:embedContent
Ollama API: /v1.0/api/my-vmr/api/generate, /v1.0/api/my-vmr/api/chat

Configuration

conductor.json

{
  "Webserver": {
    "Hostname": "localhost",
    "Port": 9000,
    "Ssl": false,
    "Cors": {
      "Enabled": false,
      "AllowedOrigins": [],
      "AllowedMethods": ["GET", "POST", "PUT", "DELETE", "OPTIONS"],
      "AllowedHeaders": ["Content-Type", "Authorization"],
      "ExposedHeaders": [],
      "AllowCredentials": false,
      "MaxAgeSeconds": 86400
    }
  },
  "Database": {
    "Type": "Sqlite",
    "Filename": "./conductor.db"
  },
  "Logging": {
    "Servers": [],
    "LogDirectory": "./logs/",
    "LogFilename": "conductor.log",
    "ConsoleLogging": true,
    "MinimumSeverity": 0
  },
  "RequestHistory": {
    "Enabled": true,
    "Directory": "./request-history/",
    "RetentionDays": 7,
    "CleanupIntervalMinutes": 60,
    "MaxRequestBodyBytes": 65536,
    "MaxResponseBodyBytes": 65536
  }
}

Supported Databases

SQLite (default): "Type": "Sqlite", "Filename": "./conductor.db"
PostgreSQL: "Type": "PostgreSql", "ConnectionString": "Host=..."
SQL Server: "Type": "SqlServer", "ConnectionString": "Server=..."
MySQL: "Type": "MySql", "ConnectionString": "Server=..."

CORS Configuration

Cross-Origin Resource Sharing (CORS) can be enabled to allow browser-based applications to access the Conductor API.

Property	Type	Default	Description
`Enabled`	bool	`false`	Enable or disable CORS support
`AllowedOrigins`	string[]	`[]`	List of allowed origins. Use `["*"]` for all origins
`AllowedMethods`	string[]	`["GET", "POST", "PUT", "DELETE", "OPTIONS"]`	Allowed HTTP methods
`AllowedHeaders`	string[]	`["Content-Type", "Authorization", ...]`	Allowed request headers
`ExposedHeaders`	string[]	`[]`	Headers exposed to the browser
`AllowCredentials`	bool	`false`	Allow credentials (cookies, auth headers). Cannot be used with `AllowedOrigins: ["*"]`
`MaxAgeSeconds`	int	`86400`	Preflight cache duration (0-86400 seconds)

Example: Allow all origins (development)

{
  "Webserver": {
    "Cors": {
      "Enabled": true,
      "AllowedOrigins": ["*"]
    }
  }
}

Example: Restrict to specific origins (production)

{
  "Webserver": {
    "Cors": {
      "Enabled": true,
      "AllowedOrigins": ["https://app.example.com", "https://admin.example.com"],
      "AllowCredentials": true
    }
  }
}

Request History Configuration

Request history captures request/response data for Virtual Model Runners with RequestHistoryEnabled set to true. This is useful for debugging, auditing, troubleshooting, and latency analysis. Each completed entry records total response time and time to first token/byte (FirstTokenTimeMs). For non-streaming responses, FirstTokenTimeMs is set to the same value as ResponseTimeMs.

Property	Type	Default	Description
`Enabled`	bool	`true`	Enable or disable request history globally
`Directory`	string	`"./request-history/"`	Directory for storing request detail JSON files
`RetentionDays`	int	`30`	Number of days to retain entries before cleanup (1-365)
`CleanupIntervalMinutes`	int	`60`	Interval between cleanup runs in minutes (1-1440)
`MaxRequestBodyBytes`	int	`65536`	Maximum request body bytes to capture (1-10485760)
`MaxResponseBodyBytes`	int	`65536`	Maximum response body bytes to capture (1-10485760)

Note: Request history must be enabled both globally (in conductor.json) and per-VMR (via the RequestHistoryEnabled property).

Captured request history entries include the VMR, routed model runner endpoint, matched model definition, matched model configuration, HTTP status, body lengths, transfer type, total response time (ResponseTimeMs), and time to first token/byte (FirstTokenTimeMs).

Request History Summary API

The summary endpoint returns aggregated request counts grouped by time buckets, useful for charting request volume and success/failure rates over time.

GET /v1.0/requesthistory/summary?startUtc={ISO8601}&endUtc={ISO8601}&interval={hour|day}&vmrGuid={guid}

Parameter	Type	Required	Description
`startUtc`	string	No	Start of time range (UTC, ISO 8601). Default: 1 hour ago
`endUtc`	string	No	End of time range (UTC, ISO 8601). Default: now
`interval`	string	No	Bucket interval: `minute`, `15minute`, `hour`, `6hour`, or `day`. Default: `hour`
`vmrGuid`	string	No	Filter by Virtual Model Runner GUID

Response:

{
  "Data": [
    {
      "TimestampUtc": "2026-03-20T10:00:00Z",
      "SuccessCount": 42,
      "FailureCount": 3,
      "TotalCount": 45
    }
  ],
  "StartUtc": "2026-03-20T10:00:00Z",
  "EndUtc": "2026-03-20T11:00:00Z",
  "Interval": "hour",
  "TotalSuccess": 42,
  "TotalFailure": 3,
  "TotalRequests": 45
}

Success is defined as HTTP status 100-399; failure is HTTP status 400-599 or null (incomplete requests).

Configuration Pinning

Model configurations can define pinned properties that are automatically merged into incoming requests:

{
  "Name": "Low Temperature Config",
  "PinnedCompletionsProperties": {
    "temperature": 0.3,
    "top_p": 0.9,
    "max_tokens": 2048
  },
  "PinnedEmbeddingsProperties": {
    "model": "text-embedding-ada-002"
  }
}

When a request comes through a virtual model runner, the pinned properties are merged with the request body, allowing you to enforce specific model parameters.

Health Checking & Rate Limiting

Endpoint Health Configuration

Model Runner Endpoints support comprehensive health checking with the following properties:

Property	Type	Default	Description
`HealthCheckUrl`	string	`/`	URL path appended to endpoint base URL for health checks
`HealthCheckMethod`	enum	`GET`	HTTP method (`GET` or `HEAD`)
`HealthCheckIntervalMs`	int	`5000`	Milliseconds between health checks
`HealthCheckTimeoutMs`	int	`5000`	Timeout for health check requests
`HealthCheckExpectedStatusCode`	int	`200`	Expected HTTP status code for healthy
`UnhealthyThreshold`	int	`2`	Consecutive failures before marking unhealthy
`HealthyThreshold`	int	`2`	Consecutive successes before marking healthy
`HealthCheckUseAuth`	bool	`false`	Include API key (Bearer token) in health check requests
`MaxParallelRequests`	int	`4`	Maximum concurrent requests (0 = unlimited)
`Weight`	int	`1`	Relative weight for load balancing (1-1000)

Note for OpenAI and vLLM APIs: When using api.openai.com or another OpenAI-compatible backend that requires authentication for model listing, set HealthCheckUseAuth to true and HealthCheckUrl to /v1/models.

Note for Gemini API: When using generativelanguage.googleapis.com, set HealthCheckUseAuth to true and HealthCheckUrl to /v1beta/models. Gemini uses the x-goog-api-key header rather than bearer token authentication.

Health Check Behavior

Endpoints start in an unhealthy state and transition to healthy after meeting the HealthyThreshold
Background tasks continuously monitor each active endpoint at the configured interval
The proxy automatically excludes unhealthy endpoints from request routing
When all endpoints are unhealthy, requests return 502 Bad Gateway
When all endpoints are at capacity, requests return 429 Too Many Requests

Rate Limiting

Each endpoint tracks in-flight requests in real-time
The MaxParallelRequests property enforces a per-endpoint concurrency limit
Set to 0 for unlimited concurrent requests
Requests are counted from start until the response completes (including streaming)

Weighted Load Balancing

The Weight property influences endpoint selection in round-robin and random modes
Higher weight = more traffic directed to that endpoint
Example: Endpoint A (weight=3) receives 3x more traffic than Endpoint B (weight=1)

Health Status API

Monitor endpoint health via the REST API:

# Health of all endpoints in tenant
GET /v1.0/modelrunnerendpoints/health

# Health of endpoints for a specific VMR
GET /v1.0/virtualmodelrunners/{id}/health

Response includes:

Current health state (healthy/unhealthy)
In-flight request count
Total uptime/downtime
Uptime percentage
Last check timestamp
Last error message (if any)

Docker

The included Docker Compose setup uses local build contexts:

Server: src/Conductor.Server/Dockerfile
Dashboard: dashboard/Dockerfile

Building Docker Images

# Build server
./build-server.sh  # or build-server.bat on Windows

# Build dashboard
./build-dashboard.sh  # or build-dashboard.bat on Windows

License

MIT License - see LICENSE.md for details.

Attributions

Music icons created by Freepik - Flaticon

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
TestResults		TestResults
archive		archive
assets		assets
dashboard		dashboard
docker		docker
src		src
www		www
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Conductor.postman_collection.json		Conductor.postman_collection.json
LICENSE.md		LICENSE.md
README.md		README.md
TESTING.md		TESTING.md
build-dashboard.bat		build-dashboard.bat
build-dashboard.sh		build-dashboard.sh
build-server.bat		build-server.bat
build-server.sh		build-server.sh
conductor.json		conductor.json

Folders and files

Latest commit

History

Repository files navigation

Conductor

Features

Quick Start

Using Docker Compose

Building from Source

Prerequisites

Build and Run Server

Build and Run Dashboard

Testing

API Overview

Supported Provider Types

Authentication

User Permission Model

Endpoints

Virtual Model Runner Proxy

Configuration

conductor.json

Supported Databases

CORS Configuration

Request History Configuration

Request History Summary API

Configuration Pinning

Health Checking & Rate Limiting

Endpoint Health Configuration

Health Check Behavior

Rate Limiting

Weighted Load Balancing

Health Status API

Docker

Building Docker Images

License

Attributions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages