Smart Data Cleaner is an intelligent dataset preprocessing and cleaning system built with FastAPI and Pandas.
The project automates repetitive data cleaning tasks such as missing value handling, duplicate removal, outlier detection, column analysis, and memory optimization while generating explainable cleaning reports.
It is designed to simplify preprocessing workflows for machine learning and data analytics pipelines.
| 1. Data Upload | 2. Smart Analysis | 3. Cleaning Config |
|---|---|---|
![]() |
![]() |
![]() |
| 4. Processed Data | 5. Visual Statistics | 6. Detailed Report |
![]() |
![]() |
![]() |
Automatically detects:
- Constant columns
- High-null columns
- ID-like columns
- High-cardinality columns
- Statistical outliers
-
Missing value standardization
-
Duplicate row removal
-
Junk value detection
-
Numeric imputation:
- Mean
- Median
- Constant
-
Categorical imputation:
- Mode
- Constant
-
IQR-based outlier removal
Generates detailed reports including:
- Removed columns with reasons
- Missing value replacements
- Outlier statistics
- Duplicate row information
- Dataset retention summary
- Memory optimization details
- CSV (
.csv) - Excel (
.xlsx,.xls)
- Automatic datatype downcasting
- Reduced memory footprint
| Method | Endpoint | Description |
|---|---|---|
| POST | /preview |
Generates dataset preview and auto-detection suggestions |
| POST | /clean |
Cleans dataset using selected parameters |
| GET | /view/original |
Returns original dataset |
| GET | /view/cleaned |
Returns cleaned dataset |
| GET | /view/removed |
Returns removed rows with reasons |
| GET | /report |
Returns detailed cleaning report |
| GET | /download |
Downloads cleaned CSV |
- FastAPI
- Pandas
- NumPy
- Python
- Statistical preprocessing
- IQR outlier detection
- Memory optimization
- Dataset profiling
git clone <repository-url>
cd AutoMLpython -m venv venvvenv\Scripts\activatesource venv/bin/activatepip install -r requirements.txtpython run.pyServer starts at:
http://localhost:8000- Upload dataset
- Preview auto-detected issues
- Configure cleaning parameters
- Run cleaning process
- Review reports
- Download cleaned dataset
AutoML/
├── app/
│ ├── cleaner/
│ │ └── cleaner.py
│ ├── routes/
│ │ └── clean_routes.py
│ ├── uploads/
│ ├── reports/
│ └── main.py
│
├── assets/
│ └── images/
│
├── static/
│ ├── index.html
│ ├── styles.css
│ └── script.js
│
├── run.py
├── requirements.txt
└── README.mdData preprocessing is one of the most repetitive stages in machine learning workflows.
I built Smart Data Cleaner to automate common cleaning operations while keeping the process transparent through explainable reports and structured preprocessing summaries.
- Prediction pipeline integration
- ML-based cleaning recommendations
- Exportable PDF reports
- Advanced dataset profiling
- Automated preprocessing workflows
Jaiv Patel





