A Python tool for generating SQL SELECT queries from structured mapping specifications defined in JSON format.
This project provides a framework for defining data transformation mappings using a declarative JSON specification. The mapping specs describe how to transform data from source tables into target tables, including column mappings, joins, filters, and constraints.
- Declarative Mapping Specifications: Define data transformations using JSON
- SQL Generation: Automatically generate SQL SELECT queries from mapping specs
- Structured Format: Supports sources, joins, filters, columns, constraints, and metadata
- Type Safety: Column data types and expectations are defined in the spec
mapping_specs/
├── data/
│ └── mapping_specs.json # Example mapping specification
├── scripts/
│ └── generate_sql.py # SQL generation script
├── main.py # Main entry point
├── pyproject.toml # Project configuration
└── README.md # This file
A mapping specification is a JSON object with the following structure:
_id: Unique identifier for the mapping spec (e.g.,"ontology_sport_bags_v1.bag_sku")target: Target table configurationrelease: Release nametable: Target table namegrain: List of columns that define the grainpartition_by: List of columns for partitioningwrite_mode: Write mode (e.g.,"merge")
sources: Array of source table definitionsalias: Table alias used in SQLcatalog: Database catalog nameschema: Database schema nametable: Table name
joins: Array of join definitionstype: Join type (e.g.,"left","inner")on: Array of join conditions (SQL expressions)
filters: Array of WHERE clause conditions (SQL expressions)columns: Array of column mappingstarget_column: Name of the target columnexpression_sql: SQL expression for the columndtype: Data type (e.g.,"INT","STRING","DECIMAL(18,2)")lineage: Array of source column referencesexpectations: Optional validation expectations (e.g.,{"not_null": true})
constraints: Table constraintsunique_keys: Array of unique key definitionsfk_checks: Array of foreign key checks
meta: Metadata about the mapping specstatus: Status of the spec (e.g.,"proposed")generator: Tool that generated the specscores: Quality scorescreated_at: Creation timestampversion: Version information
python scripts/generate_sql.pyThis will read data/mapping_specs.json and generate a SQL SELECT query.
Given the example mapping spec, the generated SQL will look like:
SELECT
ic.ItemClassID AS BagFamilyId,
ic.Descr AS BagFamilyName,
ii.InventoryID AS SKUId,
ii.InventoryCD AS SKUCode,
ii.Descr AS Description,
ii.BasePrice AS ListPrice
FROM group_iii.silver.initemclass AS ic
LEFT JOIN group_iii.silver.inventoryitem AS ii ON ii.ItemClassID = ic.ItemClassID
WHERE ii.InventoryID IS NOT NULL;- Python >= 3.12
- No external dependencies (uses only standard library)
The project uses ruff for linting and code formatting.
# Using uv (recommended)
uv sync
# Or using pip
pip install -e .See data/mapping_specs.json for a complete example of a mapping specification that defines:
- Two source tables (
initemclassandinventoryitem) - A LEFT JOIN between them
- Column mappings with data types and lineage
- Filters and constraints
- Metadata about the mapping
[Add your license information here]