If you find this useful, please β the repo! Β Also check out Vesta AI Explorer! β my full-featured native macOS AI app.
Note
31 Mar, 2026. AFM was pinned to an older version of https://github.com/huggingface/swift-huggingface. I have now pinned to the latest which uses hub for model cache. The older version downloaded models to the ~/Documents/Huggingface folder which was causing some pain with iCloud sync. They are now stored under ~/.cache which is not in iCloud scope. the TLDR is that models will be re-downloaded again. You can manually delete the older models located in ~/Documents/Huggingface to regain some valuable space available (spring cleaning!). Please report any issues.
Attention M-series Mac AI enthusiasts! You don't need to be a Swift developer to explore. Vibe coding really allows anyone to participate in this project. A lot of the hype is real! It does work.
Fork this repo first, then clone your fork to submit PRs:
git clone https://github.com/<your-username>/maclocal-api.git
cd maclocal-api
claude
/build-afmTo just experiment locally
git clone https://github.com/scouzi1966/maclocal-api.git
cd maclocal-api
claude
/build-afm/build-afm is an AI skill that builds for the first time so that you can start coding
Start vibe coding! I will add support for skills with more coding agents in the future.
Extensive testing of Qwen3.5-35B-A3B with afm. Uses an experimental technique with Claude and Codex as judges for evaluation scoring. Click the link below to view test results.
Run open-source MLX models or Apple's on-device Foundation Model through an OpenAI-compatible API. Built entirely in Swift for maximum Metal GPU performance. No Python runtime, no cloud, no API keys.
Important
The nightly build is the future stable release. It includes everything in v0.9.11 plus:
- No new features yet β nightly is currently in sync with the stable release
Tip
π Huge thanks to @jesserobbins β first-time contributor, landed two substantial features in this cycle (Vision OCR + Speech transcription). Both PRs brought afm's Apple-native capabilities from the CLI into first-class HTTP APIs. Contributions of this size and quality from a new contributor are rare and appreciated.
| Stable (v0.9.11) | Nightly (afm-next) | |
|---|---|---|
| Homebrew | brew install scouzi1966/afm/afm |
brew install scouzi1966/afm/afm-next |
| pip | pip install macafm |
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next |
| Release notes | v0.9.11 | v0.9.11-next |
Tip
Switching between stable and nightly:
brew unlink afm && brew install scouzi1966/afm/afm-next # switch to nightly
brew unlink afm-next && brew link afm # switch back to stable
ASSUMES you did a brew install scouzi1966/afm/afm previouslyOlder stable releases are kept as pinned formulae in the Homebrew tap and as version-pinned wheels on PyPI. Useful for reproducing an issue against a specific build or rolling back without waiting for a new release.
Homebrew (pinned stable formulae): afm@<version> β available for 0.9.0, 0.9.1, 0.9.3β0.9.10.
brew install scouzi1966/afm/afm@0.9.10 # install v0.9.10
brew uninstall afm # if current afm is already installed
brew link afm@0.9.10 # expose `afm` on PATH
afm --version # β v0.9.10Homebrew (pinned nightly formulae): afm-next@<full-version> β e.g. afm-next@0.9.11-next.9c3225e.20260418. Lists of available pinned nightlies are at github.com/scouzi1966/homebrew-afm.
brew install scouzi1966/afm/afm-next@0.9.11-next.9c3225e.20260418pip (version-pinned wheels): any published release.
pip install macafm==0.9.10 # previous stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ \
macafm-next==0.9.11.dev20260418 # pinned nightly# Run any MLX model with WebUI
afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit -w
# Or any smaller model
afm mlx -m mlx-community/gemma-3-4b-it-8bit -w
# Chat from the terminal (auto-downloads from Hugging Face)
afm mlx -m Qwen3-0.6B-4bit -s "Explain quantum computing"
# Interactive model picker (lists your downloaded models)
MACAFM_MLX_MODEL_CACHE=/path/to/models afm mlx -w
# Apple's on-device Foundation Model with WebUI
afm -wOpenCode is a terminal-based AI coding assistant. Connect it to afm for a fully local coding experience β no cloud, no API keys. No Internet required (other than initially download the model of course!)
1. Configure OpenCode (~/.config/opencode/opencode.json):
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "macafm (local)",
"options": {
"baseURL": "http://localhost:9999/v1"
},
"models": {
"mlx-community/Qwen3-Coder-Next-4bit": {
"name": "mlx-community/Qwen3-Coder-Next-4bit"
}
}
}
}
}2. Start afm with a coding model:
afm mlx -m mlx-community/Qwen3-Coder-Next-4bit -t 1.0 --top-p 0.95 --max-tokens 81923. Launch OpenCode and type /connect. Scroll down to the very bottom of the provider list β macafm (local) will likely be the last entry. Select it, and when prompted for an API key, enter any value (e.g. x) β tokenized access is not yet implemented in afm so the key is ignored. All inference runs locally on your Mac's GPU.
28 models tested and verified including Qwen3, Gemma 3/3n, GLM-4/5, DeepSeek V3, LFM2, SmolLM3, Llama 3.2, MiniMax M2.5, Nemotron, and more. See test reports.
- Vesta AI Explorer β full-featured native macOS AI chat app
- AFMTrainer β LoRA fine-tuning wrapper for Apple's toolkit (Mac M-series & Linux CUDA)
- Apple Foundation Model Adapters β Apple's adapter training toolkit
- π OpenAI API Compatible - Works with existing OpenAI client libraries and applications
- π§ MLX Local Models - Run any Hugging Face MLX model locally (Qwen, Gemma, Llama, DeepSeek, GLM, and 28+ tested models)
- π API Gateway - Auto-discovers and proxies Ollama, LM Studio, Jan, and other local backends into a single API
- β‘ LoRA adapter support - Supports fine-tuning with LoRA adapters using Apple's tuning Toolkit
- π± Apple Foundation Models - Uses Apple's on-device 3B parameter language model
- ποΈ Vision OCR - Extract text from images and PDFs using Apple Vision via CLI and HTTP (
afm vision,/v1/vision/ocr) - π₯οΈ Built-in WebUI - Chat interface with model selection (
afm -w) - π Privacy-First - All processing happens locally on your device
- β‘ Fast & Lightweight - No network calls, no API keys required
- π οΈ Easy Integration - Drop-in replacement for OpenAI API endpoints
- π Token Usage Tracking - Provides accurate token consumption metrics
- **macOS 26 (Tahoe) or later
- Apple Silicon Mac (M1/M2/M3/M4 series)
- Apple Intelligence enabled in System Settings
- **Xcode 26 (for building from source)
# Add the tap
brew tap scouzi1966/afm
# Install AFM
brew install afm
# Verify installation
afm --version# Install from PyPI
pip install macafm
# Verify installation
afm --version# Clone the repository with submodules
git clone --recurse-submodules https://github.com/scouzi1966/maclocal-api.git
cd maclocal-api
# Build everything from scratch (patches + webui + release build)
./Scripts/build-from-scratch.sh
# Or skip webui if you don't have Node.js
./Scripts/build-from-scratch.sh --skip-webui
# Or use make (patches + release build, no webui)
make
# Run
./.build/release/afm --version# API server only (Apple Foundation Model on port 9999)
afm
# API server with WebUI chat interface
afm -w
# WebUI + API gateway (auto-discovers Ollama, LM Studio, Jan, etc.)
afm -w -g
# Custom port with verbose logging
afm -p 8080 -v
# Show help
afm -hRun open-source models locally on Apple Silicon using MLX:
# Run a model with single prompt
afm mlx -m mlx-community/Qwen2.5-0.5B-Instruct-4bit -s "Explain gravity"
# Start MLX model with WebUI
afm mlx -m mlx-community/gemma-3-4b-it-8bit -w
# Interactive model picker (lists downloaded models)
afm mlx -w
# MLX model as API server
afm mlx -m mlx-community/Llama-3.2-1B-Instruct-4bit -p 8080
# Pipe mode
cat essay.txt | afm mlx -m mlx-community/Qwen3-0.6B-4bit -i "Summarize this"
# MLX help
afm mlx --helpModels are downloaded from Hugging Face on first use and cached locally. Any model from the mlx-community collection is supported.
POST /v1/chat/completions
Compatible with OpenAI's chat completions API.
curl -X POST http://localhost:9999/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "foundation",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'GET /v1/models
Returns available Foundation Models.
curl http://localhost:9999/v1/modelsPOST /v1/vision/ocr
Runs Apple Vision OCR against local files, uploads, base64 payloads, data: URLs, and OpenAI-style image inputs.
curl -X POST http://localhost:9999/v1/vision/ocr \
-H "Content-Type: application/json" \
-d '{
"file": "/tmp/invoice.pdf",
"recognition_level": "accurate",
"languages": ["en-US"],
"max_pages": 10
}'The endpoint returns structured JSON with per-document text, per-page text, text blocks, detected tables, document hints, and a top-level combined_text field. See docs/vision-ocr-api.md for request formats, options, and response details.
GET /health
Server health status endpoint.
curl http://localhost:9999/healthfrom openai import OpenAI
# Point to your local MacLocalAPI server
client = OpenAI(
api_key="not-needed-for-local",
base_url="http://localhost:9999/v1"
)
response = client.chat.completions.create(
model="foundation",
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms"}
]
)
print(response.choices[0].message.content)The OCR endpoint also accepts OpenAI-style multimodal payloads. This is useful when your client already sends messages[].content[] parts with image_url.
curl -X POST http://localhost:9999/v1/vision/ocr \
-H "Content-Type: application/json" \
-d '{
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Extract the invoice text"},
{
"type": "image_url",
"image_url": {
"url": "data:application/pdf;base64,..."
}
}
]
}],
"recognition_level": "accurate",
"languages": ["en-US"]
}'Foundation chat requests can also auto-run Apple Vision OCR before prompting the model when:
- the request includes image content
- the request includes the built-in
apple_vision_ocrtool tool_choiceisauto,required, omitted, or explicitly selects that tool
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: 'not-needed-for-local',
baseURL: 'http://localhost:9999/v1',
});
const completion = await openai.chat.completions.create({
messages: [{ role: 'user', content: 'Write a haiku about programming' }],
model: 'foundation',
});
console.log(completion.choices[0].message.content);# Basic chat completion
curl -X POST http://localhost:9999/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "foundation",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
}'
# With temperature control
curl -X POST http://localhost:9999/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "foundation",
"messages": [{"role": "user", "content": "Be creative!"}],
"temperature": 0.8
}'# Single prompt mode
afm -s "Explain quantum computing"
# Piped input from other commands
echo "What is the meaning of life?" | afm
cat file.txt | afm
git log --oneline | head -5 | afm
# Custom instructions with pipe
echo "Review this code" | afm -i "You are a senior software engineer"MacLocalAPI/
βββ Package.swift # Swift Package Manager config
βββ Sources/MacLocalAPI/
β βββ main.swift # CLI entry point & ArgumentParser
β βββ Server.swift # Vapor web server configuration
β βββ Controllers/
β β βββ ChatCompletionsController.swift # OpenAI API endpoints
β βββ Models/
β βββ FoundationModelService.swift # Apple Foundation Models wrapper
β βββ OpenAIRequest.swift # Request data models
β βββ OpenAIResponse.swift # Response data models
βββ README.md
OVERVIEW: macOS server that exposes Apple's Foundation Models through
OpenAI-compatible API
Use -w to enable the WebUI, -g to enable API gateway mode (auto-discovers and
proxies to Ollama, LM Studio, Jan, and other local LLM backends).
USAGE: afm <options>
afm mlx [<options>] Run local MLX models from Hugging Face
afm vision <image> OCR text extraction from images/PDFs
OPTIONS:
-s, --single-prompt <single-prompt>
Run a single prompt without starting the server
-i, --instructions <instructions>
Custom instructions for the AI assistant (default:
You are a helpful assistant)
-v, --verbose Enable verbose logging
--no-streaming Disable streaming responses (streaming is enabled by
default)
-a, --adapter <adapter> Path to a .fmadapter file for LoRA adapter fine-tuning
-p, --port <port> Port to run the server on (default: 9999)
-H, --hostname <hostname>
Hostname to bind server to (default: 127.0.0.1)
-t, --temperature <temperature>
Temperature for response generation (0.0-1.0)
-r, --randomness <randomness>
Sampling mode: 'greedy', 'random',
'random:top-p=<0.0-1.0>', 'random:top-k=<int>', with
optional ':seed=<int>'
-P, --permissive-guardrails
Permissive guardrails for unsafe or inappropriate
responses
-w, --webui Enable webui and open in default browser
-g, --gateway Enable API gateway mode: discover and proxy to local
LLM backends (Ollama, LM Studio, Jan, etc.)
--prewarm <prewarm> Pre-warm the model on server startup for faster first
response (y/n, default: y)
--version Show the version.
-h, --help Show help information.
Note: afm also accepts piped input from other commands, equivalent to using -s
with the piped content as the prompt.
The server respects standard logging environment variables:
LOG_LEVEL- Set logging level (trace, debug, info, notice, warning, error, critical)
- Model Scope: Apple Foundation Model is a 3B parameter model (optimized for on-device performance)
- macOS 26+ Only: Requires the latest macOS with Foundation Models framework
- Apple Intelligence Required: Must be enabled in System Settings
- Token Estimation: Uses word-based approximation for token counting (Foundation model only; proxied backends report real counts)
- Ensure you're running **macOS 26 or later
- Enable Apple Intelligence in System Settings β Apple Intelligence & Siri
- Verify you're on an Apple Silicon Mac
- Restart the application after enabling Apple Intelligence
- Check if the port is already in use:
lsof -i :9999 - Try a different port:
afm -p 8080 - Enable verbose logging:
afm -v
- Ensure you have **Xcode 26 installed
- Update Swift toolchain:
xcode-select --install - Clean and rebuild:
swift package clean && swift build -c release
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
# Clone the repo with submodules
git clone --recurse-submodules https://github.com/scouzi1966/maclocal-api.git
cd maclocal-api
# Full build from scratch (submodules + patches + webui + release)
./Scripts/build-from-scratch.sh
# Or for debug builds during development
./Scripts/build-from-scratch.sh --debug --skip-webui
# Run with verbose logging
./.build/debug/afm -w -g -vThis project is licensed under the MIT License - see the LICENSE file for details.
- Apple for the Foundation Models framework
- The Vapor Swift web framework team
- OpenAI for the API specification standard
- The Swift community for excellent tooling
If you encounter any issues or have questions:
- Check the Troubleshooting section
- Search existing GitHub Issues
- Create a new issue with detailed information about your problem
- Streaming response support
- MLX local model support (28+ models tested)
- Multiple model support (API gateway mode)
- Web UI for testing (llama.cpp WebUI integration)
- Vision OCR subcommand
- Function/tool calling (OpenAI-compatible, multiple formats)
- Performance optimizations
- BFCL integration for automated tool calling validation
- Docker containerization (when supported)
Made with β€οΈ for the Apple Silicon community
Bringing the power of local AI to your fingertips.
