A Python toolkit for automated CSV data analysis with statistical profiling and visualization.
AutoCSV Profiler provides automated analysis of CSV files with statistical summaries, data quality assessment, and visualization generation. It features memory-efficient processing, automatic delimiter detection, and a rich console interface.
Key Features:
- Interactive analysis mode with step-by-step guidance
- Automatic delimiter detection and encoding validation
- Memory-efficient chunked processing for large files
- Statistical analysis with descriptive statistics and data quality metrics
- Visualization generation (KDE plots, box plots, Q-Q plots, bar charts, pie charts)
- Rich console interface with progress tracking
- Configurable via CLI flags or environment variables
Requirements: Python 3.8 - 3.13
pip install autocsv-profilerFor detailed installation instructions, see the User Guide.
Interactive Mode:
autocsv-profiler| Analysis Start | Analysis Complete |
|---|---|
![]() |
![]() |
Step-by-step guidance for first-time users.
Direct Analysis:
autocsv-profiler data.csvQuick analysis with sensible defaults.
# Show help
autocsv-profiler --help# Basic analysis
autocsv-profiler data.csv
# Custom output directory
autocsv-profiler data.csv --output results/
# Custom delimiter
autocsv-profiler data.csv --delimiter ";"
# Large file processing
autocsv-profiler data.csv --memory-limit 4.0 --chunk-size 20000
# Non-interactive mode
autocsv-profiler data.csv --non-interactive
# Debug mode
autocsv-profiler data.csv --debugFor complete CLI documentation, see the User Guide.
import autocsv_profiler
# Basic analysis
result_dir = autocsv_profiler.analyze('data.csv')
print(f"Analysis saved to: {result_dir}")
# Custom configuration
result_dir = autocsv_profiler.analyze(
csv_file_path='data.csv',
output_dir='results/',
delimiter=',',
chunk_size=10000,
memory_limit_gb=1
)
# Interactive mode
result_dir = autocsv_profiler.analyze(
csv_file_path='data.csv',
interactive=True
)For complete API documentation, see the API Reference.
Environment variables with AUTOCSV_ prefix:
# Performance settings
export AUTOCSV_PERFORMANCE_MEMORY_LIMIT_GB=2
export AUTOCSV_PERFORMANCE_CHUNK_SIZE=20000
# Logging settings
export AUTOCSV_LOGGING_LEVEL=DEBUG
export AUTOCSV_LOGGING_CONSOLE_LEVEL=INFOFor complete configuration options, see the Configuration Guide.
Analysis generates the following files in the output directory:
Data Summaries:
dataset_analysis.txt- Dataset overview and basic statisticsnumerical_summary.csv- Summary statistics for numeric columnscategorical_summary.csv- Summary for categorical columnsnumerical_stats.csv- Descriptive statistics using researchpycategorical_stats.csv- Categorical frequency analysisdistinct_values.txt- Unique value counts per column
Visualizations:
kde_plots/- Kernel density estimation plotsbox_plots/- Box plots for numerical variablesqq_plots/- Q-Q plots for normality testingbar_charts/- Bar charts for categorical variablespie_charts/- Categorical distribution pie charts
Process Logs:
autocsv_profiler.log- Processing log file
For detailed output documentation, see the User Guide.
User Documentation:
- User Guide - Installation, CLI usage, and examples
- Configuration - Settings and environment variables
- Troubleshooting - Problem-solving guide
Developer Documentation:
- API Reference - Python API documentation
- Developer Guide - Development workflow and architecture
- Architecture Diagrams - Visual system architecture
Complete Index:
- Documentation Index - Complete documentation overview
Contributions are welcome! See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE for details.
This software includes third-party components. See NOTICE and THIRD_PARTY_LICENSES.txt for complete license information.
- PyPI: https://pypi.org/project/autocsv-profiler/
- Repository: https://github.com/dhaneshbb/autocsv-profiler
- Documentation: https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/index.md
- Issues: https://github.com/dhaneshbb/autocsv-profiler/issues
- Changelog: https://github.com/dhaneshbb/autocsv-profiler/blob/master/CHANGELOG.md
Version: 2.0.0 | Status: Beta | Python: 3.8-3.13
Copyright 2025 dhaneshbb | License: MIT | Homepage: https://github.com/dhaneshbb/autocsv-profiler



