Skip to content

ashkankardan/ai-invoice-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Invoice Processor

A hybrid automation pipeline that combines AI extraction (OpenAI) with traditional rule-based validation (Pydantic) to process invoices in any format.

This is the companion project for the blog post: AI Automation vs. Traditional Automation: When to Use Each

What It Does

Drop an invoice in any format (clean, messy, embedded in an email, or in a foreign language) and the pipeline will:

  1. AI Step: Send the raw text to OpenAI (gpt-4o-mini), which extracts structured data using Structured Outputs
  2. Validation Step: Pydantic validators check the extracted data against business rules (line items sum correctly, tax rate is reasonable, dates make sense)
  3. Routing Step: The invoice is routed to "approved", "needs review", or "failed" based on how many validation issues were found

Why This Architecture

This project demonstrates the hybrid automation pattern: using AI for the judgment-heavy step (reading unstructured text and extracting meaning) and traditional rules for the deterministic step (validating data, routing decisions). Neither approach works well alone:

  • Pure rules can't handle the variety of invoice formats
  • Pure AI can't guarantee the output meets your business constraints

Together, they create a pipeline that's both flexible and reliable.

Quick Start

Prerequisites

Setup

  1. Clone the repo:

    git clone https://github.com/ashkankardan/ai-invoice-processor.git
    cd ai-invoice-processor
  2. Create a virtual environment and install dependencies:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -r requirements.txt
  3. Set your API key:

    cp .env.example .env
    # Edit .env and add your OpenAI API key

Run

Process all sample invoices:

python process_invoice.py

Process a specific invoice:

python process_invoice.py samples/invoice_messy.txt

Process your own invoice:

python process_invoice.py path/to/your/invoice.txt

Sample Output

═══════════════════════════════════════════
Processing: samples/invoice_clean.txt
═══════════════════════════════════════════

✅ Status: APPROVED

Vendor:    Acme Web Services LLC
Invoice #: INV-2024-0042
Date:      2024-11-15
Due:       2024-12-15

Line Items:
  1. Monthly Hosting (Stan  x1    @ $150.00    = $150.00
  2. SSL Certificate Renew  x2    @ $29.99     = $59.98
  3. Database Backup Servi  x1    @ $45.00     = $45.00

Subtotal:  $254.98
Tax:       $20.40
Total:     $275.38
Currency:  USD

Project Structure

├── process_invoice.py    # Main entry point: reads files, calls AI, validates, prints results
├── schemas.py            # Pydantic models (InvoiceData, LineItem, ProcessingResult)
├── validator.py          # Business rule validation and routing logic
├── samples/              # Sample invoices in different formats
│   ├── invoice_clean.txt
│   ├── invoice_messy.txt
│   ├── invoice_email.txt
│   └── invoice_foreign.txt
├── requirements.txt
├── .env.example
└── .gitignore

Experiment Ideas

  • Add your own invoices in different formats and see how the AI handles them
  • Swap gpt-4o-mini for a different model (e.g. gpt-4o) and compare accuracy
  • Add more validators in validator.py (duplicate invoice detection, vendor allowlist, etc.)
  • Extend the schema in schemas.py with fields like payment_method, purchase_order_number, or billing_address
  • Connect the output to a real system: write approved invoices to a CSV, send review items to Slack via webhook

Tech Stack

  • OpenAI API - gpt-4o-mini with Structured Outputs for intelligent data extraction
  • Pydantic - Schema definition, type validation, and business rule enforcement
  • Python 3.11+ - Type hints and modern syntax

Blog Post

This project is the companion code for: AI Automation vs. Traditional Automation: When to Use Each (With a Hands-On Tutorial)

License

MIT

Author

Ashkan Kardan - ashkankardan.com

About

Hybrid AI + rules pipeline: extract structured data from any invoice format using OpenAI Structured Outputs + Pydantic validation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages