On-device AI SDK for Android — LLM inference, image generation, image processing, and text-to-speech. No cloud, no internet, runs entirely on-device.
Built for Android (ARMv8/ARMv9 via NDK) with JNI + native C++ backends.
Note: This repo is developed strictly for ToolNeuron. If you want to use these SDKs in your own app, fork or clone this repo and integrate the modules you need.
| Module | What it does | Backend | Package |
|---|---|---|---|
| gguf_lib | LLM inference (chat, embeddings, tool calling) | llama.cpp (custom fork) | com.dark.gguf_lib |
| ai_sd | Image generation (txt2img, img2img, inpaint) | QNN (Hexagon DSP) + MNN | com.dark.ai_sd |
| ai_supertonic_tts | Text-to-speech (5 languages, 10 voices) | ONNX Runtime | com.mp.ai_supertonic_tts |
// settings.gradle.kts
include(":gguf_lib")
include(":ai_sd")
include(":ai_supertonic_tts")
// app/build.gradle.kts
dependencies {
implementation(project(":gguf_lib")) // LLM
implementation(project(":ai_sd")) // Image Gen
implementation(project(":ai_supertonic_tts")) // TTS
}- Min SDK: 27 (Android 8.1)
- Target SDK: 36
- ABI:
arm64-v8a(all modules) - CMake: 3.31.4
- JDK: 17
- Gradle: 9.3.1
- AGP: 9.0.1
On-device LLM inference powered by a custom llama.cpp fork optimized for ARM CPU with KleidiAI micro-kernels.
Key features:
- Multi-turn chat with Flow-based streaming tokens
- Model-agnostic tool calling with GBNF grammar constraints (STRICT/LAZY modes)
- Text embeddings for semantic search
- Character personality engine: mood, emotion, uncensored mode via logit/sampling control
- KV cache prefix reuse, context shifting, disk-backed prompt cache
- Speculative decoding (ngram self-speculative)
- CPU affinity pinning, zero-copy token delivery, JNI method ID caching
See gguf_lib/CLAUDE.md for full API reference.
On-device Stable Diffusion via Qualcomm QNN (Hexagon DSP) or MNN (CPU fallback).
Key features:
- txt2img, img2img, inpainting
- QNN acceleration on Snapdragon SoCs (8 Gen 1+)
- LoRA support
- DPM-Solver++ and Euler Ancestral schedulers
- Tiled VAE for high-resolution generation
- Safety checker (optional)
See ai_sd/README.md for full API reference.
On-device TTS using Supertonic v2 (66M params, ONNX Runtime). Produces 44.1 kHz mono audio at up to 167x faster than real-time.
Key features:
- 5 languages: English, Korean, Spanish, Portuguese, French
- 10 voice presets (5 female, 5 male)
- Streaming playback via AudioTrack
- Save to WAV/PCM files
- Auto-chunking for long text
- Optional NNAPI GPU/NPU acceleration
See ai_supertonic_tts/TTS_SDK_DOCS.md for full API reference.
# Full build (all modules)
./gradlew assembleRelease
# Single module
./gradlew :gguf_lib:assembleRelease
./gradlew :ai_sd:assembleRelease
./gradlew :ai_supertonic_tts:assembleReleaseNative C++ is built automatically via CMake during Gradle build. First build takes longer due to llama.cpp compilation.
Ai-Systems/
├── gguf_lib/ # LLM SDK
│ ├── src/main/cpp/ # C++ (JNI → llama.cpp)
│ └── src/main/java/ # Kotlin API
├── ai_sd/ # Image Gen SDK
│ ├── src/main/cpp/ # C++ (JNI → QNN/MNN)
│ └── src/main/java/ # Kotlin API
├── ai_supertonic_tts/ # TTS SDK
│ ├── src/main/cpp/ # C++ (JNI → ONNX Runtime)
│ └── src/main/java/ # Kotlin API
└── build.gradle.kts # Root config
Each SDK is an independent Android library module with its own JNI layer. They share no native dependencies and can be included individually.
- ToolNeuron — Android AI assistant with character intelligence, tool calling, image generation, and TTS
MIT