OpenDataDiscovery + Great Expectations = <3
This demo project shows ways OpenDataDiscovery works wits Data QA.
Run OpenDataDiscovery Platform and Postgres services to store Great Expectations results. By default, OpenDataDiscovery is started on http://locahost:8080
docker compose up -d
Next commands will create and activate virtual environment and install 3 libraries:
- great_expectations - To work with GreatExpectations.
- odd-cli - Has some useful commands,i.e. reading and collection local files metadata, creating OpenDataDiscovery tokens.
- odd-great-expectations - Contains ODDAction to catch validation results, map them and send metadata to OpenDataDiscovery Platform.
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtNext command will create token with name=data_qa and print it in console.
odd tokens create data_qa -h http://localhost:8080 Store env variable to reduce duplicates commands CLI commands.
export ODD_PLATFORM_HOST=http://localhost:8080
export ODD_PLATFORM_TOKEN=<token_from_previous_step>For demo purposes we prepared 2 files data/BankChurners.csv, data/BankChurners_Bad.csv.
Next CLI command will read files from /data folder, gather metadata and ingest it to OpenDataDiscovery Platform.
odd collect data For demo purposes we prepared expectations (/great_expectations/expectations/validate_bank_data.json) and 2 checkpoints (/great_expectations/checkpoints/*) to run data quality tests against BankChurners files
succeeded_checkpoint - Validates data/BankChurners.csv file.
great_expectations checkpoint run succeeded_checkpointfailed_checkpoint - Validates data/BankChurners.csv and data/BankChurners_Bad.csv files.
great_expectations checkpoint run failed_checkpointGo to http://localhost:8080 to see results.
