Skip to content

Youxise/SQLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQLLM

A comparative demonstration of AI-powered tools for databases, including vector search, natural language to SQL conversion, and RAG question-answering.

Overview

This project showcases three key AI capabilities for modern databases:

  1. Vector Search - Semantic search across documents using embeddings
  2. Text-to-SQL - Convert natural language questions into SQL queries
  3. RAG Q&A - Retrieval-Augmented Generation for intelligent question answering

Features

  • Multi-backend vector search comparison (ChromaDB, SQLite-vec, PostgreSQL pgvector, MongoDB)
  • Local LLM integration with Ollama for SQL generation
  • Real-time benchmarking and performance metrics
  • Interactive Streamlit interface

Quick Start

Prerequisites

  • Python 3.8+
  • 2GB RAM minimum
  • Optional: Docker (for PostgreSQL/MongoDB backends)
  • Optional: Ollama (for local LLM features)

Installation

# Clone the repository
git clone https://github.com/yourusername/ai-database-tools-demo.git
cd ai-database-tools-demo

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the application
streamlit run app.py

The app will open at http://localhost:8501

With Docker (Optional)

For PostgreSQL and MongoDB backends:

docker-compose up -d

With Ollama (Optional)

For Text-to-SQL and RAG features:

# Install Ollama from https://ollama.ai
ollama pull llama3.2:1b

Project Structure

.
├── app.py                          # Main Streamlit application
├── requirements.txt                # Python dependencies
├── docker-compose.yml              # PostgreSQL & MongoDB setup
├── src/
│   ├── vector_search/              # Vector search implementations
│   │   ├── vector_search_comparative.py
│   │   ├── sqlite_backend.py
│   │   ├── postgres_backend.py
│   │   └── mongodb_backend.py
│   ├── text_to_sql/                # Natural language to SQL
│   │   └── text_to_sql_enhanced.py
│   ├── rag/                        # RAG Q&A system
│   │   └── rag_enhanced.py
│   └── utils/                      # Utilities and benchmarks
│       └── benchmarks.py
└── data/                           # Data storage

Modules

Vector Search

Compare semantic search across different vector database backends:

  • ChromaDB - Lightweight vector database
  • SQLite-vec - SQLite with vector extension
  • PostgreSQL pgvector - PostgreSQL extension (Docker required)
  • MongoDB - Atlas vector search (Docker required)

Features:

  • Side-by-side performance comparison
  • Real-time indexing and search benchmarks
  • Support for 384-dimensional embeddings (all-MiniLM-L6-v2)

Text-to-SQL

Convert natural language questions into executable SQL:

  • Powered by Ollama (local LLM)
  • Fallback to predefined queries if Ollama unavailable
  • SQLite demo database with clients, products, orders
  • Real-time SQL generation and execution

RAG Q&A

Retrieval-Augmented Generation for intelligent answers:

  • Document indexing with embeddings
  • Semantic search for relevant context
  • LLM-powered answer generation (with Ollama)
  • Fallback to direct passage display

Configuration

Environment Variables

Create a .env file (optional):

OLLAMA_HOST=http://localhost:11434
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=demo_db
POSTGRES_USER=demo_user
POSTGRES_PASSWORD=demo_pass
MONGODB_URI=mongodb://localhost:27017

Usage Examples

Vector Search

# Index documents
from src.vector_search import demo_vector_search
demo_vector_search()

# Documents are automatically converted to embeddings
# Search with: "What is a vector database?"

Text-to-SQL

# Generate SQL from natural language
question = "Top 10 customers by revenue"
# Returns: SELECT c.name, SUM(o.total) as revenue FROM customers c...

RAG Q&A

# Ask questions about indexed documents
question = "How to install pgvector?"
# Returns contextualized answer with sources

Performance

Approximate benchmarks on standard hardware:

Backend Indexing (8 docs) Search Latency QPS
ChromaDB 0.2s 10-30ms ~3000
SQLite-vec <0.1s 5-15ms ~1000
PostgreSQL pgvector 0.3s 5-15ms ~5000
MongoDB 0.4s 10-30ms ~8000

Technologies Used

  • Streamlit - Web interface
  • ChromaDB - Vector database
  • Sentence Transformers - Text embeddings
  • Ollama - Local LLM
  • PostgreSQL + pgvector - Vector search extension
  • MongoDB - Document database with vector search
  • SQLite - Embedded database
  • Pandas - Data manipulation

Troubleshooting

ChromaDB warnings

The app mocks onnxruntime to avoid DLL errors. This is normal and doesn't affect functionality.

Ollama not detected

Make sure Ollama is running: ollama serve

Add to PATH if needed (Windows): C:\Users\YourName\AppData\Local\Programs\Ollama

SQLite threading errors

All SQLite connections use check_same_thread=False for Streamlit compatibility.

Docker services not connecting

# Check Docker status
docker-compose ps

# Restart services
docker-compose restart

# View logs
docker-compose logs -f

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

MIT License - feel free to use this project for learning and development.

Acknowledgments

  • Built with Streamlit for rapid prototyping
  • Embeddings powered by Sentence Transformers
  • Local LLM capabilities via Ollama
  • Vector search backends: ChromaDB, pgvector, MongoDB Atlas

Contact

For questions or feedback, please open an issue on GitHub.

About

Comparative demo of AI tools for databases including VS, Text-to-SQL and RAG Q&A.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages