afm — Run Any MLX LLM on Your Mac, 100% Local

If you find this useful, please ⭐ the repo! Also check out Vesta AI Explorer! — my full-featured native macOS AI app.

Note

31 Mar, 2026. AFM was pinned to an older version of https://github.com/huggingface/swift-huggingface. I have now pinned to the latest which uses hub for model cache. The older version downloaded models to the ~/Documents/Huggingface folder which was causing some pain with iCloud sync. They are now stored under ~/.cache which is not in iCloud scope. the TLDR is that models will be re-downloaded again. You can manually delete the older models located in ~/Documents/Huggingface to regain some valuable space available (spring cleaning!). Please report any issues.

Attention M-series Mac AI enthusiasts! You don't need to be a Swift developer to explore. Vibe coding really allows anyone to participate in this project. A lot of the hype is real! It does work.

Fork this repo first, then clone your fork to submit PRs:

git clone https://github.com/<your-username>/maclocal-api.git   
cd maclocal-api
claude
/build-afm

To just experiment locally

git clone https://github.com/scouzi1966/maclocal-api.git   
cd maclocal-api
claude
/build-afm

/build-afm is an AI skill that builds for the first time so that you can start coding

Start vibe coding! I will add support for skills with more coding agents in the future.

afm — Run Any MLX LLM on Your Mac, 100% Local

Extensive testing of Qwen3.5-35B-A3B with afm. Uses an experimental technique with Claude and Codex as judges for evaluation scoring. Click the link below to view test results.

afm-next Nightly Test Report — Qwen3.5-35B-A3B Focus

Run open-source MLX models or Apple's on-device Foundation Model through an OpenAI-compatible API. Built entirely in Swift for maximum Metal GPU performance. No Python runtime, no cloud, no API keys.

What's new in afm-next

Important

The nightly build is the future stable release. It includes everything in v0.9.11 plus:

No new features yet — nightly is currently in sync with the stable release

Tip

🙏 Huge thanks to @jesserobbins — first-time contributor, landed two substantial features in this cycle (Vision OCR + Speech transcription). Both PRs brought afm's Apple-native capabilities from the CLI into first-class HTTP APIs. Contributions of this size and quality from a new contributor are rare and appreciated.

Install

	Stable (v0.9.11)	Nightly (afm-next)
Homebrew	`brew install scouzi1966/afm/afm`	`brew install scouzi1966/afm/afm-next`
pip	`pip install macafm`	`pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next`
Release notes	v0.9.11	v0.9.11-next

Tip

Switching between stable and nightly:

brew unlink afm && brew install scouzi1966/afm/afm-next   # switch to nightly
brew unlink afm-next && brew link afm                      # switch back to stable
ASSUMES you did a brew install scouzi1966/afm/afm previously

Install a previous version

Older stable releases are kept as pinned formulae in the Homebrew tap and as version-pinned wheels on PyPI. Useful for reproducing an issue against a specific build or rolling back without waiting for a new release.

Homebrew (pinned stable formulae): afm@<version> — available for 0.9.0, 0.9.1, 0.9.3–0.9.10.

brew install scouzi1966/afm/afm@0.9.10      # install v0.9.10
brew uninstall afm                          # if current afm is already installed
brew link afm@0.9.10                        # expose `afm` on PATH
afm --version                               # → v0.9.10

Homebrew (pinned nightly formulae): afm-next@<full-version> — e.g. afm-next@0.9.11-next.9c3225e.20260418. Lists of available pinned nightlies are at github.com/scouzi1966/homebrew-afm.

brew install scouzi1966/afm/afm-next@0.9.11-next.9c3225e.20260418

pip (version-pinned wheels): any published release.

pip install macafm==0.9.10                  # previous stable
pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ \
  macafm-next==0.9.11.dev20260418           # pinned nightly

Quick Start

# Run any MLX model with WebUI
afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit -w

# Or any smaller model
afm mlx -m mlx-community/gemma-3-4b-it-8bit -w

# Chat from the terminal (auto-downloads from Hugging Face)
afm mlx -m Qwen3-0.6B-4bit -s "Explain quantum computing"

# Interactive model picker (lists your downloaded models)
MACAFM_MLX_MODEL_CACHE=/path/to/models afm mlx -w

# Apple's on-device Foundation Model with WebUI
afm -w

Use with OpenCode

OpenCode is a terminal-based AI coding assistant. Connect it to afm for a fully local coding experience — no cloud, no API keys. No Internet required (other than initially download the model of course!)

1. Configure OpenCode (~/.config/opencode/opencode.json):

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "macafm (local)",
      "options": {
        "baseURL": "http://localhost:9999/v1"
      },
      "models": {
        "mlx-community/Qwen3-Coder-Next-4bit": {
          "name": "mlx-community/Qwen3-Coder-Next-4bit"
        }
      }
    }
  }
}

2. Start afm with a coding model:

afm mlx -m mlx-community/Qwen3-Coder-Next-4bit -t 1.0 --top-p 0.95 --max-tokens 8192

3. Launch OpenCode and type /connect. Scroll down to the very bottom of the provider list — macafm (local) will likely be the last entry. Select it, and when prompted for an API key, enter any value (e.g. x) — tokenized access is not yet implemented in afm so the key is ignored. All inference runs locally on your Mac's GPU.

28+ MLX Models Tested

28 models tested and verified including Qwen3, Gemma 3/3n, GLM-4/5, DeepSeek V3, LFM2, SmolLM3, Llama 3.2, MiniMax M2.5, Nemotron, and more. See test reports.

⭐ Star History

Related Projects

Vesta AI Explorer — full-featured native macOS AI chat app
AFMTrainer — LoRA fine-tuning wrapper for Apple's toolkit (Mac M-series & Linux CUDA)
Apple Foundation Model Adapters — Apple's adapter training toolkit

🌟 Features

🔗 OpenAI API Compatible - Works with existing OpenAI client libraries and applications
🧠 MLX Local Models - Run any Hugging Face MLX model locally (Qwen, Gemma, Llama, DeepSeek, GLM, and 28+ tested models)
🌐 API Gateway - Auto-discovers and proxies Ollama, LM Studio, Jan, and other local backends into a single API
⚡ LoRA adapter support - Supports fine-tuning with LoRA adapters using Apple's tuning Toolkit
📱 Apple Foundation Models - Uses Apple's on-device 3B parameter language model
👁️ Vision OCR - Extract text from images and PDFs using Apple Vision via CLI and HTTP (afm vision, /v1/vision/ocr)
🖥️ Built-in WebUI - Chat interface with model selection (afm -w)
🔒 Privacy-First - All processing happens locally on your device
⚡ Fast & Lightweight - No network calls, no API keys required
🛠️ Easy Integration - Drop-in replacement for OpenAI API endpoints
📊 Token Usage Tracking - Provides accurate token consumption metrics

📋 Requirements

**macOS 26 (Tahoe) or later
Apple Silicon Mac (M1/M2/M3/M4 series)
Apple Intelligence enabled in System Settings
**Xcode 26 (for building from source)

🚀 Quick Start

Installation

Option 1: Homebrew (Recommended)

# Add the tap
brew tap scouzi1966/afm

# Install AFM
brew install afm

# Verify installation
afm --version

Option 2: pip (PyPI)

# Install from PyPI
pip install macafm

# Verify installation
afm --version

Option 3: Build from Source

# Clone the repository with submodules
git clone --recurse-submodules https://github.com/scouzi1966/maclocal-api.git
cd maclocal-api

# Build everything from scratch (patches + webui + release build)
./Scripts/build-from-scratch.sh

# Or skip webui if you don't have Node.js
./Scripts/build-from-scratch.sh --skip-webui

# Or use make (patches + release build, no webui)
make

# Run
./.build/release/afm --version

Running

# API server only (Apple Foundation Model on port 9999)
afm

# API server with WebUI chat interface
afm -w

# WebUI + API gateway (auto-discovers Ollama, LM Studio, Jan, etc.)
afm -w -g

# Custom port with verbose logging
afm -p 8080 -v

# Show help
afm -h

MLX Local Models

Run open-source models locally on Apple Silicon using MLX:

# Run a model with single prompt
afm mlx -m mlx-community/Qwen2.5-0.5B-Instruct-4bit -s "Explain gravity"

# Start MLX model with WebUI
afm mlx -m mlx-community/gemma-3-4b-it-8bit -w

# Interactive model picker (lists downloaded models)
afm mlx -w

# MLX model as API server
afm mlx -m mlx-community/Llama-3.2-1B-Instruct-4bit -p 8080

# Pipe mode
cat essay.txt | afm mlx -m mlx-community/Qwen3-0.6B-4bit -i "Summarize this"

# MLX help
afm mlx --help

Models are downloaded from Hugging Face on first use and cached locally. Any model from the mlx-community collection is supported.

📡 API Endpoints

Chat Completions

POST /v1/chat/completions

Compatible with OpenAI's chat completions API.

curl -X POST http://localhost:9999/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "foundation",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

List Models

GET /v1/models

Returns available Foundation Models.

curl http://localhost:9999/v1/models

Vision OCR

POST /v1/vision/ocr

Runs Apple Vision OCR against local files, uploads, base64 payloads, data: URLs, and OpenAI-style image inputs.

curl -X POST http://localhost:9999/v1/vision/ocr \
  -H "Content-Type: application/json" \
  -d '{
    "file": "/tmp/invoice.pdf",
    "recognition_level": "accurate",
    "languages": ["en-US"],
    "max_pages": 10
  }'

The endpoint returns structured JSON with per-document text, per-page text, text blocks, detected tables, document hints, and a top-level combined_text field. See docs/vision-ocr-api.md for request formats, options, and response details.

Health Check

GET /health

Server health status endpoint.

curl http://localhost:9999/health

💻 Usage Examples

Python with OpenAI Library

from openai import OpenAI

# Point to your local MacLocalAPI server
client = OpenAI(
    api_key="not-needed-for-local",
    base_url="http://localhost:9999/v1"
)

response = client.chat.completions.create(
    model="foundation",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms"}
    ]
)

print(response.choices[0].message.content)

Vision OCR from OpenAI-Compatible Clients

The OCR endpoint also accepts OpenAI-style multimodal payloads. This is useful when your client already sends messages[].content[] parts with image_url.

curl -X POST http://localhost:9999/v1/vision/ocr \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Extract the invoice text"},
        {
          "type": "image_url",
          "image_url": {
            "url": "data:application/pdf;base64,..."
          }
        }
      ]
    }],
    "recognition_level": "accurate",
    "languages": ["en-US"]
  }'

Foundation chat requests can also auto-run Apple Vision OCR before prompting the model when:

the request includes image content
the request includes the built-in apple_vision_ocr tool
tool_choice is auto, required, omitted, or explicitly selects that tool

JavaScript/Node.js

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: 'not-needed-for-local',
  baseURL: 'http://localhost:9999/v1',
});

const completion = await openai.chat.completions.create({
  messages: [{ role: 'user', content: 'Write a haiku about programming' }],
  model: 'foundation',
});

console.log(completion.choices[0].message.content);

curl Examples

# Basic chat completion
curl -X POST http://localhost:9999/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "foundation",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

# With temperature control
curl -X POST http://localhost:9999/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "foundation",
    "messages": [{"role": "user", "content": "Be creative!"}],
    "temperature": 0.8
  }'

Single Prompt & Pipe Examples

# Single prompt mode
afm -s "Explain quantum computing"

# Piped input from other commands
echo "What is the meaning of life?" | afm
cat file.txt | afm
git log --oneline | head -5 | afm

# Custom instructions with pipe
echo "Review this code" | afm -i "You are a senior software engineer"

🏗️ Architecture

MacLocalAPI/
├── Package.swift                    # Swift Package Manager config
├── Sources/MacLocalAPI/
│   ├── main.swift                   # CLI entry point & ArgumentParser
│   ├── Server.swift                 # Vapor web server configuration
│   ├── Controllers/
│   │   └── ChatCompletionsController.swift  # OpenAI API endpoints
│   └── Models/
│       ├── FoundationModelService.swift     # Apple Foundation Models wrapper
│       ├── OpenAIRequest.swift              # Request data models
│       └── OpenAIResponse.swift             # Response data models
└── README.md

🔧 Configuration

Command Line Options

OVERVIEW: macOS server that exposes Apple's Foundation Models through
OpenAI-compatible API

Use -w to enable the WebUI, -g to enable API gateway mode (auto-discovers and
proxies to Ollama, LM Studio, Jan, and other local LLM backends).

USAGE: afm <options>
       afm mlx [<options>]      Run local MLX models from Hugging Face
       afm vision <image>       OCR text extraction from images/PDFs

OPTIONS:
  -s, --single-prompt <single-prompt>
                          Run a single prompt without starting the server
  -i, --instructions <instructions>
                          Custom instructions for the AI assistant (default:
                          You are a helpful assistant)
  -v, --verbose           Enable verbose logging
  --no-streaming          Disable streaming responses (streaming is enabled by
                          default)
  -a, --adapter <adapter> Path to a .fmadapter file for LoRA adapter fine-tuning
  -p, --port <port>       Port to run the server on (default: 9999)
  -H, --hostname <hostname>
                          Hostname to bind server to (default: 127.0.0.1)
  -t, --temperature <temperature>
                          Temperature for response generation (0.0-1.0)
  -r, --randomness <randomness>
                          Sampling mode: 'greedy', 'random',
                          'random:top-p=<0.0-1.0>', 'random:top-k=<int>', with
                          optional ':seed=<int>'
  -P, --permissive-guardrails
                          Permissive guardrails for unsafe or inappropriate
                          responses
  -w, --webui             Enable webui and open in default browser
  -g, --gateway           Enable API gateway mode: discover and proxy to local
                          LLM backends (Ollama, LM Studio, Jan, etc.)
  --prewarm <prewarm>     Pre-warm the model on server startup for faster first
                          response (y/n, default: y)
  --version               Show the version.
  -h, --help              Show help information.

Note: afm also accepts piped input from other commands, equivalent to using -s
with the piped content as the prompt.

Environment Variables

The server respects standard logging environment variables:

LOG_LEVEL - Set logging level (trace, debug, info, notice, warning, error, critical)

⚠️ Limitations & Notes

Model Scope: Apple Foundation Model is a 3B parameter model (optimized for on-device performance)
macOS 26+ Only: Requires the latest macOS with Foundation Models framework
Apple Intelligence Required: Must be enabled in System Settings
Token Estimation: Uses word-based approximation for token counting (Foundation model only; proxied backends report real counts)

🔍 Troubleshooting

"Foundation Models framework is not available"

Ensure you're running **macOS 26 or later
Enable Apple Intelligence in System Settings → Apple Intelligence & Siri
Verify you're on an Apple Silicon Mac
Restart the application after enabling Apple Intelligence

Server Won't Start

Check if the port is already in use: lsof -i :9999
Try a different port: afm -p 8080
Enable verbose logging: afm -v

Build Issues

Ensure you have **Xcode 26 installed
Update Swift toolchain: xcode-select --install
Clean and rebuild: swift package clean && swift build -c release

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

# Clone the repo with submodules
git clone --recurse-submodules https://github.com/scouzi1966/maclocal-api.git
cd maclocal-api

# Full build from scratch (submodules + patches + webui + release)
./Scripts/build-from-scratch.sh

# Or for debug builds during development
./Scripts/build-from-scratch.sh --debug --skip-webui

# Run with verbose logging
./.build/debug/afm -w -g -v

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Apple for the Foundation Models framework
The Vapor Swift web framework team
OpenAI for the API specification standard
The Swift community for excellent tooling

📞 Support

If you encounter any issues or have questions:

Check the Troubleshooting section
Search existing GitHub Issues
Create a new issue with detailed information about your problem

🗺️ Roadmap

Streaming response support
MLX local model support (28+ models tested)
Multiple model support (API gateway mode)
Web UI for testing (llama.cpp WebUI integration)
Vision OCR subcommand
Function/tool calling (OpenAI-compatible, multiple formats)
Performance optimizations
BFCL integration for automated tool calling validation
Docker containerization (when supported)

Made with ❤️ for the Apple Silicon community

Bringing the power of local AI to your fingertips.

Name		Name	Last commit message	Last commit date
Latest commit History 611 Commits
.claude/skills		.claude/skills
.codex		.codex
.github/workflows		.github/workflows
Scripts		Scripts
Setups		Setups
Sources		Sources
Tests/MacLocalAPITests		Tests/MacLocalAPITests
archive		archive
bench/swift-bench		bench/swift-bench
bugs/data		bugs/data
docs		docs
macafm		macafm
macafm_next.egg-info		macafm_next.egg-info
macafm_next		macafm_next
media		media
prompts		prompts
skills/afm		skills/afm
test-reports		test-reports
vendor		vendor
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
Package.swift		Package.swift
README.md		README.md
pyproject-next.toml		pyproject-next.toml
pyproject.toml		pyproject.toml
test-all-features.sh		test-all-features.sh

Folders and files

Latest commit

History

Repository files navigation

afm — Run Any MLX LLM on Your Mac, 100% Local

afm-next Nightly Test Report — Qwen3.5-35B-A3B Focus

What's new in afm-next

Install

Install a previous version

Quick Start

Use with OpenCode

28+ MLX Models Tested

⭐ Star History

Related Projects

🌟 Features

📋 Requirements

🚀 Quick Start

Installation

Option 1: Homebrew (Recommended)

Option 2: pip (PyPI)

Option 3: Build from Source

Running

MLX Local Models

📡 API Endpoints

Chat Completions

List Models

Vision OCR

Health Check

💻 Usage Examples

Python with OpenAI Library

Vision OCR from OpenAI-Compatible Clients

JavaScript/Node.js

curl Examples

Single Prompt & Pipe Examples

🏗️ Architecture

🔧 Configuration

Command Line Options

Environment Variables

⚠️ Limitations & Notes

🔍 Troubleshooting

"Foundation Models framework is not available"

Server Won't Start

Build Issues

🤝 Contributing

Development Setup

📄 License

🙏 Acknowledgments

📞 Support

🗺️ Roadmap

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 41

Packages 0

Uh oh!

Uh oh!

Contributors 7

Languages

Packages