AI powered Retrieval-Augmented Generation (RAG) system for chatting with your documents.
Upload PDFs, Word docs, Markdown, or plain text and ask questions in natural language. ragmate embeds your documents into a vector store, retrieves the most semantically relevant passages at query time, and passes them as context to Gemini to generate grounded answers with source citations. It grounds the answers it generates on the documents you provide instead of relying on prior training data.
- Upload a document (PDF, DOCX, Markdown, or TXT) - up to 2 files at once, 20 MB each max
- ragmate parses it, splits it into chunks, and embeds each chunk using Google's
gemini-embedding-001model - Ask a question - ragmate embeds your query, finds the top matching chunks via cosine similarity, and passes them to Gemini 2.5 Flash to generate a grounded answer
- Every answer includes source citations showing which chunk and document it came from
A built-in web interface is served at http://localhost:8000:
- Drag-and-drop or click to upload documents (up to 2 at a time, 20 MB each max)
- Real-time upload progress with percentage via Server-Sent Events
- Document list with upload timestamps and delete support
- Chat interface with expandable source citations
API docs (Swagger) available at http://localhost:8000/docs.
┌─────────────────────────────────────────────────────────┐
│ Client │
│ (Web UI / curl / Swagger UI) │
└────────────────────────┬────────────────────────────────┘
│ HTTP X-API-Key
▼
┌─────────────────────────────────────────────────────────┐
│ FastAPI (ragmate) │
│ │
│ POST /documents GET /documents DELETE /documents │
│ POST /query GET /health GET / │
└──────────────┬──────────────────────────┬───────────────┘
│ │
┌──────────▼──────────┐ ┌───────────▼───────────┐
│ Ingestion Pipeline │ │ Retrieval Pipeline │
│ │ │ │
│ parse → chunk │ │ embed query │
│ → embed → store │ │ → vector search │
└──────────┬──────────┘ │ → generate answer │
│ └───────────┬───────────┘
│ │
┌──────────▼──────────────────────────▼───────────┐
│ ChromaDB (on disk) │
│ │
│ chunks collection document_meta collection │
│ (vectors + text) (filename, chunk_count, │
│ uploaded_at) │
└─────────────────────────────────────────────────┘
│ │
┌──────────▼──────────┐ ┌───────────▼───────────┐
│ Google Embeddings │ │ Gemini 2.5 Flash │
│ gemini-embedding- │ │ (answer generation) │
│ 001 │ └───────────────────────┘
└─────────────────────┘
| Layer | Technology |
|---|---|
| API framework | FastAPI + uvicorn |
| LLM | Gemini 2.5 Flash (gemini-2.5-flash) |
| Embeddings | gemini-embedding-001 (768-dim, batched) |
| Vector store | ChromaDB (in-process, persistent) |
| Document parsing | pypdf, python-docx |
| Progress streaming | Server-Sent Events (SSE) |
| Validation | Pydantic v2 |
| Testing | pytest, pytest-asyncio, pytest-cov (84% coverage) |
| Linting / formatting | ruff, black |
| Type checking | mypy (strict) |
| Security scanning | bandit |
| Containerization | Docker |
| CI | GitHub Actions |
- Python 3.13+
- A Google AI Studio API key (free tier works)
git clone https://github.com/rajeshsub/ragmate.git
cd ragmate
make bootstrapThis creates a virtualenv, installs all dependencies, sets up pre-commit hooks, and copies .env.example → .env.
Open .env and fill in the two required values:
GEMINI_API_KEY=your-gemini-api-key-here
API_KEY=choose-any-strong-secret-stringGEMINI_API_KEY: get one free at Google AI StudioAPI_KEY: any string you choose; used to protect the ragmate endpoints
make devOpen http://localhost:8000 for the web UI, or http://localhost:8000/docs for the API explorer.
docker build -t ragmate .
docker run -p 7860:7860 \
-e GEMINI_API_KEY=your-key \
-e API_KEY=your-secret \
-v $(pwd)/chroma_data:/app/chroma_data \
ragmateAll endpoints require the X-API-Key header.
curl -X POST http://localhost:8000/documents \
-H "X-API-Key: your-secret" \
-F "file=@/path/to/document.pdf"Returns a Server-Sent Events stream with real-time progress:
data: {"stage": "parsing", "pct": 5, "message": "Parsing document.pdf…"}
data: {"stage": "embedding", "pct": 45, "message": "Embedding batch 1/2…"}
data: {"stage": "done", "pct": 100, "id": "3f4a1b2c-…", "filename": "document.pdf", "chunk_count": 42}
curl -X POST http://localhost:8000/query \
-H "X-API-Key: your-secret" \
-H "Content-Type: application/json" \
-d '{"question": "What are the main conclusions?"}'{
"answer": "The main conclusions are...",
"sources": [
{
"doc_id": "3f4a1b2c-...",
"chunk_index": 7,
"score": 0.9231,
"excerpt": "In conclusion, the study found..."
}
]
}curl http://localhost:8000/documents -H "X-API-Key: your-secret"curl -X DELETE http://localhost:8000/documents/3f4a1b2c-... \
-H "X-API-Key: your-secret"make test # run all tests (80% coverage gate)
make lint # ruff + mypy + bandit
make format # auto-fix with ruff + black
make coverage # generate htmlcov/index.html
make docs # export openapi.json| Format | Extension | Parser |
|---|---|---|
| PDF (text-based) | .pdf |
pypdf |
| Word document | .docx |
python-docx |
| Markdown | .md |
plain text |
| Plain text | .txt |
plain text |
Scanned / image-based PDFs are not supported (no OCR). Text must be selectable in the PDF.
Hosted on Hugging Face Spaces: https://huggingface.co/spaces/rajeshsub/ragmate
Every push to main that passes CI is automatically deployed there. Contact the repo owner for the access key.
Key architectural choices are recorded as ADRs in docs/adr/:
- ADR 0001: ChromaDB as sole data store (no Postgres)
- ADR 0002: Gemini 2.5 Flash + gemini-embedding-001
- ADR 0003: Static API key authentication
