Skip to content

Python toolkit for automated CSV data analysis. Features interactive CLI, automatic delimiter detection, memory-efficient processing, statistical profiling, and visualization generation. Supports Python 3.8-3.13. Perfect for analysts working with CSV files.

License

Notifications You must be signed in to change notification settings

dhaneshbb/autocsv-profiler

Repository files navigation

AutoCSV Profiler

PyPI version Python Version Version License Status

A Python toolkit for automated CSV data analysis with statistical profiling and visualization.

Overview

AutoCSV Profiler provides automated analysis of CSV files with statistical summaries, data quality assessment, and visualization generation. It features memory-efficient processing, automatic delimiter detection, and a rich console interface.

Key Features:

  • Interactive analysis mode with step-by-step guidance
  • Automatic delimiter detection and encoding validation
  • Memory-efficient chunked processing for large files
  • Statistical analysis with descriptive statistics and data quality metrics
  • Visualization generation (KDE plots, box plots, Q-Q plots, bar charts, pie charts)
  • Rich console interface with progress tracking
  • Configurable via CLI flags or environment variables

Installation

Requirements: Python 3.8 - 3.13

pip install autocsv-profiler

For detailed installation instructions, see the User Guide.

Quick Start

Demo

Interactive Mode:

autocsv-profiler
Analysis Start Analysis Complete
Interactive Start Analysis Complete

Step-by-step guidance for first-time users.

Direct Analysis:

autocsv-profiler data.csv

Quick analysis with sensible defaults.

Usage

# Show help
autocsv-profiler --help

Command Line Interface

CLI Help

# Basic analysis
autocsv-profiler data.csv

# Custom output directory
autocsv-profiler data.csv --output results/

# Custom delimiter
autocsv-profiler data.csv --delimiter ";"

# Large file processing
autocsv-profiler data.csv --memory-limit 4.0 --chunk-size 20000

# Non-interactive mode
autocsv-profiler data.csv --non-interactive

# Debug mode
autocsv-profiler data.csv --debug

For complete CLI documentation, see the User Guide.

Python API

import autocsv_profiler

# Basic analysis
result_dir = autocsv_profiler.analyze('data.csv')
print(f"Analysis saved to: {result_dir}")

# Custom configuration
result_dir = autocsv_profiler.analyze(
    csv_file_path='data.csv',
    output_dir='results/',
    delimiter=',',
    chunk_size=10000,
    memory_limit_gb=1
)

# Interactive mode
result_dir = autocsv_profiler.analyze(
    csv_file_path='data.csv',
    interactive=True
)

For complete API documentation, see the API Reference.

Configuration

Environment variables with AUTOCSV_ prefix:

# Performance settings
export AUTOCSV_PERFORMANCE_MEMORY_LIMIT_GB=2
export AUTOCSV_PERFORMANCE_CHUNK_SIZE=20000

# Logging settings
export AUTOCSV_LOGGING_LEVEL=DEBUG
export AUTOCSV_LOGGING_CONSOLE_LEVEL=INFO

For complete configuration options, see the Configuration Guide.

Output Files

Analysis generates the following files in the output directory:

Data Summaries:

  • dataset_analysis.txt - Dataset overview and basic statistics
  • numerical_summary.csv - Summary statistics for numeric columns
  • categorical_summary.csv - Summary for categorical columns
  • numerical_stats.csv - Descriptive statistics using researchpy
  • categorical_stats.csv - Categorical frequency analysis
  • distinct_values.txt - Unique value counts per column

Visualizations:

  • kde_plots/ - Kernel density estimation plots
  • box_plots/ - Box plots for numerical variables
  • qq_plots/ - Q-Q plots for normality testing
  • bar_charts/ - Bar charts for categorical variables
  • pie_charts/ - Categorical distribution pie charts

Process Logs:

  • autocsv_profiler.log - Processing log file

For detailed output documentation, see the User Guide.

Documentation

User Documentation:

Developer Documentation:

Complete Index:

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

This software includes third-party components. See NOTICE and THIRD_PARTY_LICENSES.txt for complete license information.

Links


Version: 2.0.0 | Status: Beta | Python: 3.8-3.13

Copyright 2025 dhaneshbb | License: MIT | Homepage: https://github.com/dhaneshbb/autocsv-profiler

About

Python toolkit for automated CSV data analysis. Features interactive CLI, automatic delimiter detection, memory-efficient processing, statistical profiling, and visualization generation. Supports Python 3.8-3.13. Perfect for analysts working with CSV files.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages