RAG Service

Reference implementation of a Retrieval-Augmented Generation (RAG) pipeline supporting ingestion, hybrid retrieval, reranking, and streaming responses.

Architecture Overview

LLM Runtime

Ollama (local model serving)

Models

Chat Models

Fast / Low-latency
- qwen2.5
- qwen2.5:0.5b

Reranker

mistral-small

Embeddings

nomic-embed-text

Pipeline Flow

Prerequisites

Ensure the following are installed:

Docker & Docker Compose
Ollama

Pull required models:

ollama pull qwen2.5
ollama pull qwen2.5:0.5b
ollama pull nomic-embed-text

Running the Service

Build and start the stack:

docker compose build --no-cache
docker compose up

The API will be available at: http://localhost:8000

API Usage

1. Ingestion

Ingest sample documents to populate the retrieval index.

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "text": "A backup failure occurs when data cannot be written to storage or restored properly.",
    "doc_id": "doc1"
  }'

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "text": "A database transaction failure happens when ACID properties are violated during commit.",
    "doc_id": "doc2"
  }'

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Ollama is a local runtime for running large language models like Qwen and Mistral.",
    "doc_id": "doc3"
  }'

Retrieval / Question Answering

Test 1 — Semantic Query

curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What happens when a backup fails?"
  }'

Expected behavior:

Retrieves semantically similar content
Returns explanation of backup failure

Test 2 — Keyword-heavy Query

curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "question": "ACID transaction commit failure database"
  }'

Expected behavior:

Keyword matching is effective
Relevant document about ACID violations is retrieved

Test 3 — Noisy / Indirect Query

curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Why does my system fail to save data when something breaks?"
  }'

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
app		app
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
architecture.png		architecture.png
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Service

Architecture Overview

LLM Runtime

Models

Chat Models

Reranker

Embeddings

Pipeline Flow

Prerequisites

Running the Service

Build and start the stack:

API Usage

1. Ingestion

Retrieval / Question Answering

Test 1 — Semantic Query

Expected behavior:

Test 2 — Keyword-heavy Query

Expected behavior:

Test 3 — Noisy / Indirect Query

Expected behavior:

Testing

Evaluation Frameworks

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Service

Architecture Overview

LLM Runtime

Models

Chat Models

Reranker

Embeddings

Pipeline Flow

Prerequisites

Running the Service

Build and start the stack:

API Usage

1. Ingestion

Retrieval / Question Answering

Test 1 — Semantic Query

Expected behavior:

Test 2 — Keyword-heavy Query

Expected behavior:

Test 3 — Noisy / Indirect Query

Expected behavior:

Testing

Evaluation Frameworks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages