A desktop application built with C++ and Qt6 that integrates a local Large Language Model (LLM) to generate report conclusions based on user annotations. Uses llama.cpp for local inference to ensure complete offline functionality and privacy.
- ✅ Privacy-First: All processing happens locally—no cloud APIs, no data sent externally
- ✅ Qt GUI: Clean, responsive interface built with Qt6 Widgets
- ✅ Annotation Management: Add and track annotations with ease
- ✅ Smart Conclusions: LLM generates contextual conclusions based on annotation count
- ✅ Persistent Model Loading: Models are loaded once and kept in memory for instant generation
- ✅ Multi-Model Support: Built-in support for Llama 3, Mistral, Phi-3, Gemma, and Qwen with automatic prompt templating
- ✅ Background Inference: Non-blocking UI with threaded LLM processing
- ✅ LLM Controller: Manage models, download new ones directly from HuggingFace, and tweak parameters
- ✅ Custom Prompts: Edit and save system prompts directly in the app
- ✅ Cross-Platform: Works on Windows, Linux, and macOS
- ✅ Docker Support: Consistent build environment with all dependencies
- LLM Concepts for C++ Developers: Learn how
llama.cppworks under the hood.
- Docker Desktop (v20.10+)
- XQuartz (macOS only, for GUI support)
- Download from: https://www.xquartz.org/
- Git (for cloning repository)
- Qt 6.x (6.2 or later recommended)
- CMake 3.20 or later
- C++17 compatible compiler
- GCC 9+ or Clang 10+ (Linux/macOS)
- MSVC 2019+ or MinGW (Windows)
- Git (for submodules)
# Install XQuartz via Homebrew
brew install --cask xquartz
# Start XQuartz
open -a XQuartz
# In XQuartz preferences (XQuartz → Preferences → Security):
# ✓ Enable "Allow connections from network clients"
# Restart XQuartz and add localhost to allowed clients
xhost + localhostgit clone --recursive https://github.com/yourusername/local_llm.git
cd local_llm
# If you forgot --recursive, run:
git submodule update --init --recursive# Build the container (first time only, may take 5-10 minutes)
docker-compose build
# Start the container
docker-compose up -d
# Access the container shell
docker-compose exec localllm bashUse docker compose exec to build the application inside the running container:
# 1. Configure with CMake (creates the build directory if it doesn't exist)
docker compose exec localllm cmake -B /workspace/build_container -S /workspace
# 2. Build the project (using -j4 for parallel compilation)
docker compose exec localllm cmake --build /workspace/build_container -j4The application includes a built-in Model Manager for easy model downloading.
- Run the application (see step 6).
- Go to File > LLM Settings.
- Switch to the Models tab.
- Click Download next to your desired model (e.g., Llama 3.2 1B or Phi-3.5 Mini).
- Once downloaded, click Select.
Alternatively, if you want to manually place a model:
Place any GGUF model in the models/ directory and select it in the settings.
To run the application with GUI support on macOS, ensure XQuartz is running and configured (xhost + localhost), then execute:
docker compose exec -T -e DISPLAY=host.docker.internal:0 localllm /workspace/build_container/LocalLLMThe GUI window should appear on your host system!
macOS (Homebrew):
brew install qt@6
export Qt6_DIR=$(brew --prefix qt@6)/lib/cmake/Qt6Ubuntu/Debian:
sudo apt-get update
sudo apt-get install qt6-base-dev qt6-tools-dev cmake build-essentialWindows:
- Download Qt from https://www.qt.io/download-qt-installer
- Run the installer and select:
- Qt 6.x (latest version)
- Choose either MinGW or MSVC compiler
- CMake (if not already installed)
- Install Visual Studio 2019+ (for MSVC) or MinGW-w64
Linux/macOS:
# Clone with submodules
git clone --recursive https://github.com/yourusername/local_llm.git
cd local_llm
# Create build directory
mkdir build && cd build
# Configure (may need to specify Qt6_DIR)
cmake ..
# Build
cmake --build . -j4
# Run
./LocalLLMWindows (PowerShell or CMD):
# Clone with submodules
git clone --recursive https://github.com/yourusername/local_llm.git
cd local_llm
# Initialize submodules if you forgot --recursive
git submodule update --init --recursive
# Configure with Visual Studio generator
# Note: If Qt is not found, add -DCMAKE_PREFIX_PATH="C:\Qt\6.x.x\msvc2019_64"
cmake -B build -G "Visual Studio 17 2022"
# Build (Release configuration recommended for better performance)
# This will also run windeployqt and copy necessary assets
cmake --build build --config Release -j4
# Run
.\build\bin\Release\LocalLLM.exeNote for Windows users: If CMake can't find Qt6, you may need to specify the Qt path:
cmake -B build -G "Visual Studio 17 2022" -DCMAKE_PREFIX_PATH="C:\Qt\6.x.x\msvc2019_64"Replace
6.x.xand compiler version with your actual Qt installation path.
-
Configure LLM
- Click ⚙ Settings or go to File > LLM Settings.
- Go to the Models tab.
- Download a model from the curated list (Llama 3, Mistral, Phi-3, etc.).
- Click Select to load the model.
- (Optional) Adjust Temperature or System Prompt in the Settings tab.
-
Add Annotations
- Select a damage type from the Classification dropdown (includes 16 common damage categories).
- Enter blade radius (meters) and optionally a free-text description (description is optional for prompts).
- Click "Add Annotation" or press Enter
- Repeat to build your annotation list
-
Generate Report
- Click "Generate Expert Conclusion"
- The first generation loads the model into memory (taking a few seconds).
- Subsequent generations are instant as the model stays loaded.
- View the generated conclusion in the output area
-
Sample Workflow
- Add 3-4 annotations → LLM suggests more analysis needed
- Add 10+ annotations → LLM highlights substantial insights available
LocalLLM uses a persistent worker thread to keep the model loaded in memory.
- Startup: Model loads once (takes 2-10s depending on size).
- Inference: Subsequent requests are processed immediately.
- Memory: The model stays in RAM/VRAM until you select a different model or close the app.
Different models require different prompt formats to work correctly. LocalLLM automatically detects and applies the correct template:
- Llama 3:
<|begin_of_text|><|start_header_id|>system... - Mistral:
[INST] ... [/INST] - ChatML (Qwen/TinyLlama):
<|im_start|>system... - Phi-3:
<|user|> ... <|end|> - Gemma:
<start_of_turn>user ...
local_llm/
├── CMakeLists.txt # Build configuration
├── Dockerfile # Docker container definition
├── docker-compose.yml # Docker orchestration
├── README.md # This file
├── models/ # Place GGUF models here
│ └── model.gguf # Your model file
├── src/
│ ├── main.cpp # Application entry point
│ ├── mainwindow.h/cpp # Main UI window
│ └── llamaworker.h/cpp # LLM inference worker
└── external/
└── llama.cpp/ # llama.cpp library (submodule)
Error: "Failed to load model"
Solutions:
- Verify model file exists in
models/directory - Check file is valid GGUF format (not corrupted download)
- Ensure sufficient RAM (4GB+ for Q4 quantized models)
- Try a smaller model like TinyLlama
Error: "Cannot connect to display"
Solutions:
# Ensure XQuartz is running
open -a XQuartz
# Allow localhost connections
xhost + localhost
# Check DISPLAY variable in container
docker-compose exec localllm echo $DISPLAY
# Should show: host.docker.internal:0
# Restart container if needed
docker-compose restartError: "Qt6 not found"
Solution (Native):
# macOS
export Qt6_DIR=/opt/homebrew/opt/qt@6/lib/cmake/Qt6
# Linux - install dev packages
sudo apt-get install qt6-base-dev qt6-tools-devError: "llama.cpp submodule empty"
Solution:
git submodule update --init --recursiveError: "Qt6 not found" or "Could not find Qt6"
Solution:
# Specify Qt path when configuring
cmake -B build -G "Visual Studio 17 2022" -DCMAKE_PREFIX_PATH="C:\Qt\6.x.x\msvc2019_64"
# For MinGW
cmake -B build -G "MinGW Makefiles" -DCMAKE_PREFIX_PATH="C:\Qt\6.x.x\mingw_64"Error: "Cannot open include file: 'llama.h'"
Solution:
# Ensure submodules are initialized
git submodule update --init --recursive
# Clean and rebuild
rmdir /s /q build
cmake -B build -G "Visual Studio 17 2022"
cmake --build build --config ReleaseError: Missing DLL files when running
Solution:
# Copy Qt DLLs to build directory, or add Qt bin to PATH
set PATH=C:\Qt\6.x.x\msvc2019_64\bin;%PATH%
# Or use windeployqt to copy all required DLLs
C:\Qt\6.x.x\msvc2019_64\bin\windeployqt.exe .\build\Release\LocalLLM.exeTips for better performance:
- Use quantized models (Q4_K_M or Q5_K_M)
- Adjust thread count in the LLM Settings dialog (try matching your physical CPU cores)
- Use smaller models for testing (TinyLlama)
- Close other applications to free RAM
You can select a custom model path directly from the LLM Settings dialog in the application.
You can adjust the following parameters in the LLM Settings dialog:
- Temperature: Controls randomness (0.0 - 2.0). Lower values are more deterministic.
- Top-P: Nucleus sampling (0.0 - 1.0). Controls diversity.
- Threads: Number of CPU threads to use for inference. Adjust based on your CPU cores.
- Context Size: Maximum context size (tokens) the model can handle. Larger values require more RAM.
- System Prompt: Customize the instructions given to the model.
The LLM Settings dialog includes a Console tab that displays real-time logs from the application and the underlying llama.cpp library. This is useful for debugging and monitoring inference progress.
- Install "Dev Containers" extension
- Open project in VS Code
- Click "Reopen in Container" when prompted
- Build from integrated terminal
- main.cpp: QApplication initialization
- mainwindow.{h,cpp}: UI and annotation management
- llamaworker.{h,cpp}: Background LLM inference with llama.cpp API
- llmcontroller.{h,cpp}: Settings dialog, model manager, and console
- consolelogger.{h,cpp}: Captures stdout/stderr/qDebug for the console tab
| Model | Size | RAM | Speed (tokens/sec) | Quality |
|---|---|---|---|---|
| TinyLlama Q4 | 600MB | 2GB | 20-40 | Good |
| Llama-2-7B Q4 | 4GB | 8GB | 5-15 | Excellent |
| Llama-2-7B Q8 | 7GB | 12GB | 3-8 | Best |
Approximate values on modern CPU (Intel i7/Apple M1)
This project is licensed under the MIT License - see LICENSE file for details.
- llama.cpp - Efficient LLM inference
- Qt Framework - Cross-platform GUI toolkit
- Meta AI Llama - Foundation models
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
- Report issues: GitHub Issues
- Documentation: Wiki
Built with ❤️ for offline privacy and local AI
