Skip to content

suvijya/DataVision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

36 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– DataVision Assistant - AI-Powered Data Analysis Platform

Transform your CSV data into insights using natural language queries.

✨ Features

  • πŸ“Š Smart CSV Analysis - Upload and get instant insights from your datasets
  • πŸ€– AI-Powered Queries - Ask questions in plain English and get intelligent responses
  • πŸ“ˆ Interactive Visualizations - Plotly charts generated automatically from your queries
  • πŸ’¬ Conversational Interface - ChatGPT-like experience for data exploration
  • πŸ”’ Secure Execution - Sandboxed code execution with safety restrictions
  • πŸ’Ύ Session Management - Persistent analysis sessions with conversation history
  • 🌐 Modern Web Interface - Responsive design with drag-and-drop file uploads

πŸ—οΈ Architecture

pydatabackend/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   └── config.py              # Application settings
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   └── v1/
β”‚   β”‚       β”œβ”€β”€ endpoints/
β”‚   β”‚       β”‚   └── session.py     # API endpoints
β”‚   β”‚       └── schemas/
β”‚   β”‚           └── session.py     # Pydantic models
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ session_manager.py     # Session lifecycle management
β”‚   β”‚   └── data_analysis.py       # LLM integration & code execution
β”‚   └── main.py                    # FastAPI application
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ index.html                 # Main UI
β”‚   β”œβ”€β”€ styles.css                 # Styling
β”‚   └── script.js                  # JavaScript functionality
β”œβ”€β”€ cache/                         # Session data storage
β”œβ”€β”€ requirements.txt               # Python dependencies
β”œβ”€β”€ .env.example                   # Environment variables template
└── README.md

πŸš€ Quick Start

1. Prerequisites

2. Installation

# Clone or navigate to the project directory
cd pydatabackend

# Create virtual environment (recommended)
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

3. Configuration

# Copy environment template
copy .env.example .env

# Edit .env file and add your Gemini API key
# Get your API key from: https://makersuite.google.com/app/apikey

Required Environment Variables:

GEMINI_API_KEY=your_gemini_api_key_here
DEBUG=True

4. Run the Application

python app/main.py

The application will start on http://localhost:8000

Available URLs:

🎯 Usage Guide

1. Upload Your Data

  • Drag and drop a CSV file (max 16MB) or click to browse
  • Supported format: CSV files with UTF-8 encoding
  • The system will automatically analyze your data structure

2. Explore Your Data

  • Sample Data Tab: View the first 5 rows
  • Statistics Tab: See numeric statistics and missing value analysis
  • Data Info Tab: Check column types and categorical information

3. Ask Questions

Use natural language to analyze your data:

Example Queries:

  • "Show me a summary of the data"
  • "Create a histogram of the age column"
  • "What's the correlation between price and sales?"
  • "Are there any missing values?"
  • "Show sales by region as a bar chart"
  • "Create a scatter plot of height vs weight"
  • "What are the top 10 customers by revenue?"

πŸ”§ API Endpoints

Session Management

  • POST /api/v1/session/start - Upload CSV and start session
  • POST /api/v1/session/query - Submit analysis query
  • GET /api/v1/sessions - List active sessions
  • GET /api/v1/session/{session_id} - Get session info
  • DELETE /api/v1/session/{session_id} - Delete session

Example API Usage

# Start a session
curl -X POST "http://localhost:8000/api/v1/session/start" \
  -F "file=@your_data.csv"

# Query the data
curl -X POST "http://localhost:8000/api/v1/session/query" \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "your-session-id",
    "query": "Show me sales by region"
  }'

πŸ›‘οΈ Security Features

  • Restricted Imports: Only safe libraries are allowed (pandas, numpy, plotly, etc.)
  • Code Sandboxing: Generated code runs in a controlled environment
  • Input Validation: File type and size validation
  • Session Isolation: Each session operates independently

🎨 Frontend Features

  • Modern UI: Clean, responsive design with gradient backgrounds
  • Drag & Drop: Intuitive file upload experience
  • Real-time Chat: Interactive conversation interface
  • Data Visualization: Integrated Plotly charts
  • Error Handling: User-friendly error messages and loading states
  • Mobile Responsive: Works on desktop and mobile devices

πŸ“Š Supported Analysis Types

Data Exploration

  • Dataset overview and statistics
  • Missing value analysis
  • Data type information
  • Sample data preview

Visualizations

  • Bar charts, line charts, scatter plots
  • Histograms and distribution plots
  • Correlation matrices
  • Custom Plotly visualizations

Statistical Analysis

  • Descriptive statistics
  • Correlation analysis
  • Grouping and aggregation
  • Trend analysis

πŸ”§ Development

Project Structure

app/
β”œβ”€β”€ core/           # Core configuration and settings
β”œβ”€β”€ api/            # API routes and schemas
β”œβ”€β”€ services/       # Business logic services
└── main.py         # Application entry point

Key Components

  • FastAPI: Modern Python web framework
  • Google Gemini: Advanced AI for code generation
  • Pandas: Data manipulation and analysis
  • Plotly: Interactive visualizations
  • Pydantic: Data validation and settings

Environment Variables

# Required
GEMINI_API_KEY=your_key_here

# Optional (with defaults)
DEBUG=True
MAX_FILE_SIZE=16777216
CACHE_DIR=cache
SESSION_TIMEOUT=86400
LLM_MODEL=gemini-1.5-flash

πŸ› Troubleshooting

Common Issues

  1. "GEMINI_API_KEY is required" error

    • Ensure you've set your API key in the .env file
    • Get a free API key from Google AI Studio
  2. Module import errors

    • Activate your virtual environment
    • Run pip install -r requirements.txt
  3. File upload fails

    • Check file format (must be CSV)
    • Verify file size (max 16MB)
    • Ensure UTF-8 encoding
  4. Charts not displaying

    • Check browser console for JavaScript errors
    • Ensure internet connection for Plotly.js CDN

Performance Tips

  • Use smaller datasets for faster processing
  • Complex visualizations may take longer to generate
  • Sessions are cached for 24 hours by default

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Ready to explore your data with AI? Upload a CSV file and start asking questions! πŸš€

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors