A hybrid automation pipeline that combines AI extraction (OpenAI) with traditional rule-based validation (Pydantic) to process invoices in any format.
This is the companion project for the blog post: AI Automation vs. Traditional Automation: When to Use Each
Drop an invoice in any format (clean, messy, embedded in an email, or in a foreign language) and the pipeline will:
- AI Step: Send the raw text to OpenAI (
gpt-4o-mini), which extracts structured data using Structured Outputs - Validation Step: Pydantic validators check the extracted data against business rules (line items sum correctly, tax rate is reasonable, dates make sense)
- Routing Step: The invoice is routed to "approved", "needs review", or "failed" based on how many validation issues were found
This project demonstrates the hybrid automation pattern: using AI for the judgment-heavy step (reading unstructured text and extracting meaning) and traditional rules for the deterministic step (validating data, routing decisions). Neither approach works well alone:
- Pure rules can't handle the variety of invoice formats
- Pure AI can't guarantee the output meets your business constraints
Together, they create a pipeline that's both flexible and reliable.
- Python 3.11 or higher
- An OpenAI API key (get one here)
-
Clone the repo:
git clone https://github.com/ashkankardan/ai-invoice-processor.git cd ai-invoice-processor -
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Set your API key:
cp .env.example .env # Edit .env and add your OpenAI API key
Process all sample invoices:
python process_invoice.pyProcess a specific invoice:
python process_invoice.py samples/invoice_messy.txtProcess your own invoice:
python process_invoice.py path/to/your/invoice.txt═══════════════════════════════════════════
Processing: samples/invoice_clean.txt
═══════════════════════════════════════════
✅ Status: APPROVED
Vendor: Acme Web Services LLC
Invoice #: INV-2024-0042
Date: 2024-11-15
Due: 2024-12-15
Line Items:
1. Monthly Hosting (Stan x1 @ $150.00 = $150.00
2. SSL Certificate Renew x2 @ $29.99 = $59.98
3. Database Backup Servi x1 @ $45.00 = $45.00
Subtotal: $254.98
Tax: $20.40
Total: $275.38
Currency: USD
├── process_invoice.py # Main entry point: reads files, calls AI, validates, prints results
├── schemas.py # Pydantic models (InvoiceData, LineItem, ProcessingResult)
├── validator.py # Business rule validation and routing logic
├── samples/ # Sample invoices in different formats
│ ├── invoice_clean.txt
│ ├── invoice_messy.txt
│ ├── invoice_email.txt
│ └── invoice_foreign.txt
├── requirements.txt
├── .env.example
└── .gitignore
- Add your own invoices in different formats and see how the AI handles them
- Swap
gpt-4o-minifor a different model (e.g.gpt-4o) and compare accuracy - Add more validators in
validator.py(duplicate invoice detection, vendor allowlist, etc.) - Extend the schema in
schemas.pywith fields likepayment_method,purchase_order_number, orbilling_address - Connect the output to a real system: write approved invoices to a CSV, send review items to Slack via webhook
- OpenAI API -
gpt-4o-miniwith Structured Outputs for intelligent data extraction - Pydantic - Schema definition, type validation, and business rule enforcement
- Python 3.11+ - Type hints and modern syntax
This project is the companion code for: AI Automation vs. Traditional Automation: When to Use Each (With a Hands-On Tutorial)
MIT
Ashkan Kardan - ashkankardan.com