Transform your CSV data into insights using natural language queries.
- π Smart CSV Analysis - Upload and get instant insights from your datasets
- π€ AI-Powered Queries - Ask questions in plain English and get intelligent responses
- π Interactive Visualizations - Plotly charts generated automatically from your queries
- π¬ Conversational Interface - ChatGPT-like experience for data exploration
- π Secure Execution - Sandboxed code execution with safety restrictions
- πΎ Session Management - Persistent analysis sessions with conversation history
- π Modern Web Interface - Responsive design with drag-and-drop file uploads
pydatabackend/
βββ app/
β βββ core/
β β βββ config.py # Application settings
β βββ api/
β β βββ v1/
β β βββ endpoints/
β β β βββ session.py # API endpoints
β β βββ schemas/
β β βββ session.py # Pydantic models
β βββ services/
β β βββ session_manager.py # Session lifecycle management
β β βββ data_analysis.py # LLM integration & code execution
β βββ main.py # FastAPI application
βββ frontend/
β βββ index.html # Main UI
β βββ styles.css # Styling
β βββ script.js # JavaScript functionality
βββ cache/ # Session data storage
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ README.md
- Python 3.8 or higher
- Google Gemini API key (free from Google AI Studio)
# Clone or navigate to the project directory
cd pydatabackend
# Create virtual environment (recommended)
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt# Copy environment template
copy .env.example .env
# Edit .env file and add your Gemini API key
# Get your API key from: https://makersuite.google.com/app/apikeyRequired Environment Variables:
GEMINI_API_KEY=your_gemini_api_key_here
DEBUG=Truepython app/main.pyThe application will start on http://localhost:8000
Available URLs:
- Main App: http://localhost:8000/
- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
- Drag and drop a CSV file (max 16MB) or click to browse
- Supported format: CSV files with UTF-8 encoding
- The system will automatically analyze your data structure
- Sample Data Tab: View the first 5 rows
- Statistics Tab: See numeric statistics and missing value analysis
- Data Info Tab: Check column types and categorical information
Use natural language to analyze your data:
Example Queries:
- "Show me a summary of the data"
- "Create a histogram of the age column"
- "What's the correlation between price and sales?"
- "Are there any missing values?"
- "Show sales by region as a bar chart"
- "Create a scatter plot of height vs weight"
- "What are the top 10 customers by revenue?"
POST /api/v1/session/start- Upload CSV and start sessionPOST /api/v1/session/query- Submit analysis queryGET /api/v1/sessions- List active sessionsGET /api/v1/session/{session_id}- Get session infoDELETE /api/v1/session/{session_id}- Delete session
# Start a session
curl -X POST "http://localhost:8000/api/v1/session/start" \
-F "file=@your_data.csv"
# Query the data
curl -X POST "http://localhost:8000/api/v1/session/query" \
-H "Content-Type: application/json" \
-d '{
"session_id": "your-session-id",
"query": "Show me sales by region"
}'- Restricted Imports: Only safe libraries are allowed (pandas, numpy, plotly, etc.)
- Code Sandboxing: Generated code runs in a controlled environment
- Input Validation: File type and size validation
- Session Isolation: Each session operates independently
- Modern UI: Clean, responsive design with gradient backgrounds
- Drag & Drop: Intuitive file upload experience
- Real-time Chat: Interactive conversation interface
- Data Visualization: Integrated Plotly charts
- Error Handling: User-friendly error messages and loading states
- Mobile Responsive: Works on desktop and mobile devices
- Dataset overview and statistics
- Missing value analysis
- Data type information
- Sample data preview
- Bar charts, line charts, scatter plots
- Histograms and distribution plots
- Correlation matrices
- Custom Plotly visualizations
- Descriptive statistics
- Correlation analysis
- Grouping and aggregation
- Trend analysis
app/
βββ core/ # Core configuration and settings
βββ api/ # API routes and schemas
βββ services/ # Business logic services
βββ main.py # Application entry point
- FastAPI: Modern Python web framework
- Google Gemini: Advanced AI for code generation
- Pandas: Data manipulation and analysis
- Plotly: Interactive visualizations
- Pydantic: Data validation and settings
# Required
GEMINI_API_KEY=your_key_here
# Optional (with defaults)
DEBUG=True
MAX_FILE_SIZE=16777216
CACHE_DIR=cache
SESSION_TIMEOUT=86400
LLM_MODEL=gemini-1.5-flash-
"GEMINI_API_KEY is required" error
- Ensure you've set your API key in the
.envfile - Get a free API key from Google AI Studio
- Ensure you've set your API key in the
-
Module import errors
- Activate your virtual environment
- Run
pip install -r requirements.txt
-
File upload fails
- Check file format (must be CSV)
- Verify file size (max 16MB)
- Ensure UTF-8 encoding
-
Charts not displaying
- Check browser console for JavaScript errors
- Ensure internet connection for Plotly.js CDN
- Use smaller datasets for faster processing
- Complex visualizations may take longer to generate
- Sessions are cached for 24 hours by default
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
Ready to explore your data with AI? Upload a CSV file and start asking questions! π