Lightweight RAG (Retrieval-Augmented Generation) system that logs performance metrics to a remote server.
- Embeds queries using sentence-transformers
- Searches documents using Qdrant vector database
- Generates answers using OpenAI
- Logs all performance metrics to a remote logging server
- Clone the repo:
git clone https://github.com/al-gent/rag-client.git
cd rag-client- Create
.envfile:
cp .env.example .env
# Edit .env with your valuesRequired variables:
RAG_HARDWARE_ID- Identifier for your hardware (e.g.,laptop-mac,server-gpu)LLM_HARDWARE_ID- Where LLM runs (e.g.,openai-api,local-gpu)MODEL_NAME- Which model to use (e.g.,gpt-4o-mini)LOG_SERVER_URL- Remote logging server (e.g.,https://rag-api.adamlgent.com)OPENAI_API_KEY- Your OpenAI API key
- Start the system:
docker-compose up -d- Load documents:
# Put PDF or TXT files in ./data directory
curl -X POST http://localhost:8000/load-documents- Query:
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"question": "Your question here"}'Every query logs:
- Embedding time
- Vector search time
- LLM generation time
- Total time
- Cost
- Success/failure
All metrics are sent to the remote logging server for comparison across different hardware setups.
POST /query- Ask a questionPOST /upload- Upload a documentPOST /load-documents- Load all documents from ./dataGET /health- Health checkGET /documents- List indexed documents