Run GhidraInsight with local AI models using Ollama - No API keys needed, full privacy, offline capable.
Version: 1.0
Status: ✅ Production Ready
Last Updated: January 5, 2026
Ollama is a simple, open-source tool that lets you run large language models locally on your machine. Unlike cloud-based APIs, local models run on your hardware - offering privacy, cost savings, and offline capability.
| Model | Size | Speed | Quality | Use Case |
|---|---|---|---|---|
| Mistral | 7B | ⚡ Fast | ⭐⭐⭐⭐ | General analysis (Recommended) |
| Llama 2 | 7B-70B | ⚡ Fast to Slow | ⭐⭐⭐⭐ | Code analysis |
| Neural Chat | 7B | ⚡ Fast | ⭐⭐⭐ | Lightweight |
| Orca | 3B-13B | ⚡ Very Fast | ⭐⭐⭐ | Quick summaries |
| CodeLLaMA | 7B-34B | ⚡ Medium | ⭐⭐⭐⭐⭐ | Code-specific (Best for binaries) |
- RAM: 8GB (for 7B models)
- Disk: 10GB free space
- CPU: Any modern processor
- Internet: Initial download only (optional after)
- RAM: 16GB+ (better performance)
- GPU: NVIDIA (CUDA) or Apple Silicon (Metal) for acceleration
- Disk: 50GB+ (multiple models)
- CPU: 6+ cores for concurrent analysis
macOS:
# Download and install
curl -fsSL https://ollama.ai/install.sh | sh
# Or use Homebrew
brew install ollamaLinux:
curl -fsSL https://ollama.ai/install.sh | shWindows (WSL2):
# In WSL2 terminal
curl -fsSL https://ollama.ai/install.sh | shVerify Installation:
ollama --version
# Should output: ollama version X.X.X# Download Mistral (recommended, 4.1GB)
ollama pull mistral
# Or CodeLLaMA for binary analysis
ollama pull codellama
# Or Llama 2 for more power
ollama pull llama2Check Downloaded Models:
ollama list
# Shows all available models# Start server (listens on http://localhost:11434)
ollama serve
# Or run in background
ollama serve &Verify Server:
curl http://localhost:11434/api/tags
# Should return list of available models# Setup guide (interactive)
ghidrainsight config setup --guided
# When prompted for LLM provider:
# Select: "Local (Ollama)"
# Model: "mistral" or "codellama"
# Endpoint: "http://localhost:11434"Create/Update config.yaml:
ai:
providers:
- name: ollama_local
type: ollama
enabled: true
model: mistral # or codellama, llama2
endpoint: http://localhost:11434
temperature: 0.7
max_tokens: 2048
- name: ollama_codellama
type: ollama
enabled: true
model: codellama:34b
endpoint: http://localhost:11434
temperature: 0.5
max_tokens: 4096
# Set default provider
default_provider: ollama_local# Set in shell
export GHIDRA_AI_PROVIDER=ollama
export GHIDRA_AI_MODEL=mistral
export GHIDRA_OLLAMA_ENDPOINT=http://localhost:11434
# Then start GhidraInsight
./scripts/startup.sh dockerAdd to .env file:
GHIDRA_AI_PROVIDER=ollama
GHIDRA_AI_MODEL=mistral
GHIDRA_OLLAMA_ENDPOINT=http://localhost:11434-
Make sure Ollama is running:
# In another terminal ollama serve -
Open Dashboard: http://localhost:3000
-
Upload Binary: Drag and drop your binary file
-
Use AI Chat: Type natural language questions
"What crypto algorithms are used?" "Summarize main functions" "Find potential vulnerabilities" "Analyze data flow for user input"
import asyncio
from ghidrainsight.client import GhidraInsightClient
async def analyze_with_local_ai():
# Connect to GhidraInsight with Ollama backend
client = GhidraInsightClient(
"http://localhost:8000",
ai_provider="ollama_local" # Use local Ollama
)
# Analyze binary
results = await client.analyze(
file_path="/path/to/binary",
features=["crypto", "vulnerabilities"],
ai_powered=True # Use local AI for analysis
)
# Results now include AI insights from local model
print(f"AI Summary: {results.ai_summary}")
print(f"Vulnerabilities: {results.vulnerabilities}")
return results
# Run analysis
asyncio.run(analyze_with_local_ai())# Analyze with local AI
ghidrainsight analyze \
--file binary.elf \
--ai-provider ollama \
--ai-model mistral \
--ai-summary \
--output report.json
# Chat with binary
ghidrainsight chat \
--file binary.elf \
--ai-provider ollama \
--prompt "What are the main vulnerabilities?"
# Batch analysis with local AI
ghidrainsight batch \
--directory ./binaries \
--ai-provider ollama \
--output ./resultsollama pull orca
# 3.3GB, very fast, good for summariesollama pull mistral
# 4.1GB, balanced, excellent qualityollama pull codellama:34b
# 20GB, slower, excellent code understandingollama pull neural-chat
# 4.7GB, fast, efficient, good for serversNVIDIA GPUs (CUDA):
# Install CUDA (if not already)
# Then Ollama will auto-detect and use GPU
# Check if GPU is being used
ollama list # Shows model sizes and usageApple Silicon (Metal):
# Automatic - M1/M2/M3 Macs will use Metal automatically
# Check Activity Monitor to verify GPU usageReduce Memory Usage:
# In config.yaml
ai:
providers:
- name: ollama_local
model: mistral
# Reduce batch size for lower RAM usage
batch_size: 256 # Default: 512
# Limit context window
max_tokens: 1024 # Default: 2048# Run Ollama on different port
OLLAMA_HOST=0.0.0.0:11435 ollama serve
# Configure GhidraInsight to use custom port
ghidrainsight config set ollama.endpoint http://localhost:11435# Run Ollama with custom settings
ollama serve \
--addr 0.0.0.0:11434 \
--num-threads 8 \
--gpu-layers 20 # Number of GPU layers
# Or set environment variables
export OLLAMA_NUM_THREAD=8
export OLLAMA_GPU_LAYERS=20
ollama serve# All analysis happens locally, no data sent to cloud
ghidrainsight analyze --file binary.elf --ai-provider ollama
# Verify no internet connectivity required
curl -v http://localhost:8000/health
# Should work offline (assuming models are cached)Block External AI Services:
# Only allow local connections
sudo ufw allow from 127.0.0.1 to any port 8000 # GhidraInsight API
sudo ufw allow from 127.0.0.1 to any port 11434 # Ollama# In config.yaml - Don't log sensitive data
logging:
level: INFO
include_prompts: false # Don't log AI prompts
include_results: false # Don't log analysis results
file: /secure/location/logs.txt| Model | RAM | Time | Quality |
|---|---|---|---|
| Ollama Orca 3B | 6GB | 45s | ⭐⭐⭐ |
| Ollama Mistral 7B | 8GB | 60s | ⭐⭐⭐⭐ |
| Ollama Llama2 13B | 12GB | 90s | ⭐⭐⭐⭐ |
| Ollama CodeLLaMA 34B | 20GB | 120s | ⭐⭐⭐⭐⭐ |
| OpenAI GPT-4 API | ∞ | 5s | ⭐⭐⭐⭐⭐ |
| Claude API | ∞ | 10s | ⭐⭐⭐⭐⭐ |
Best Balance: Mistral (7B) - Fast enough for interactive use, good quality
# Check if Ollama is running
curl http://localhost:11434/api/tags
# If not running, start it
ollama serve &
# Wait 2 seconds for startup
sleep 2
# Try again
ghidrainsight analyze --file binary.elf --ai-provider ollama# List available models
ollama list
# Download missing model
ollama pull mistral
# Verify download
ollama list# Reduce thread count
export OLLAMA_NUM_THREAD=4
ollama serve
# Or limit GPU layers
export OLLAMA_GPU_LAYERS=10
ollama serve# Use smaller model (faster)
ollama pull neural-chat # 4.7GB, very fast
# Or use GPU acceleration
# Install NVIDIA CUDA or use Apple Silicon# Use smaller model
ollama pull orca:3b # Only 3.3GB
# Or increase system swap
# macOS: System Preferences → Memory Tab
# Linux: sudo dd if=/dev/zero of=/swapfile bs=1G count=16
# Or reduce max_tokens in config
ghidrainsight config set ai.max_tokens 1024# Terminal 1: Orca for fast analysis
OLLAMA_HOST=127.0.0.1:11434 ollama serve
# Terminal 2: CodeLLaMA for detailed analysis
OLLAMA_HOST=127.0.0.1:11435 ollama serve
# Terminal 3: Start GhidraInsight
export OLLAMA_ENDPOINTS="localhost:11434,localhost:11435"
./scripts/startup.sh dockerai:
providers:
- name: ollama_fast
model: orca:3b
endpoint: http://localhost:11434
use_for: quick_summaries
- name: ollama_detailed
model: codellama:34b
endpoint: http://localhost:11435
use_for: deep_analysis
- name: ollama_balanced
model: mistral
endpoint: http://localhost:11434
use_for: default
default_provider: ollama_balanced- Mistral: https://mistral.ai/
- Llama: https://llama.meta.com/
- CodeLLaMA: https://about.fb.com/news/2023/08/code-llama-ai/
ollama pull mistral # Good balance of speed and quality# macOS: Install LaunchAgent
# Linux: Use systemd service
# Windows: Use Task Scheduler# Check memory usage
top | grep ollama
# Monitor GPU
nvidia-smi # For NVIDIA
metal stats # For Apple Silicon# Models are cached in ~/.ollama/models/
# Keep a backup for offline use# Quick summaries → Orca
# Balanced analysis → Mistral
# Code deep-dive → CodeLLaMA
# Production → Neural Chat# Add to your shell startup file
ollama pull mistral # Pre-download
ollama serve & # Start in backgroundasync def compare_models():
models = ["mistral", "codellama", "llama2"]
for model in models:
results = await client.analyze(
file_path="binary.elf",
ai_model=model
)
print(f"\n{model}:\n{results.summary}")# Test different models
for model in mistral llama2 codellama; do
time ghidrainsight analyze --file binary.elf --ai-model $model
done- ✅ Install Ollama:
curl -fsSL https://ollama.ai/install.sh | sh - ✅ Download Model:
ollama pull mistral - ✅ Start Server:
ollama serve - ✅ Configure GhidraInsight:
ghidrainsight config setup --guided - ✅ Run Analysis:
ghidrainsight analyze --file binary.elf --ai-provider ollama - ✅ Open Dashboard: http://localhost:3000
Have suggestions for Ollama integration? → GitHub Discussions
Privacy-Focused • Open Source • Fast • Free
Enjoy powerful local AI analysis without cloud dependencies!
Last Updated: January 5, 2026