Skip to content

This project is an Airflow-based ETL (Extract, Transform, Load) pipeline that automates the process of fetching data from an external API, uploading it to an S3 bucket, and performing a data transformation using Greenplum.

Notifications You must be signed in to change notification settings

iagunov/greenflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Greenflow project

This project is an Airflow-based ETL (Extract, Transform, Load) pipeline that automates the process of fetching data from an external API, uploading it to an S3 bucket, and performing a data transformation using Greenplum.

Key components:

1.	Data Fetching and S3 Upload:
	The fetch_and_upload_to_s3() function retrieves JSON data from a specified API and uploads it to an S3 bucket. The API URL, bucket name, and object name are dynamically configurable using Airflow Variables, ensuring flexibility across environments.
2.	Greenplum Data Transformation:
	The run_greenplum_transformation() function connects to a Greenplum database and performs a transformation. It reads data from the S3 bucket as an external table, compares it to an existing Greenplum table, and inserts the unmatched records into a result table.
3.	Airflow DAG:
	The project is structured as an Airflow Directed Acyclic Graph (DAG), which schedules the execution of the tasks on a daily basis. First, the data is fetched and uploaded to S3, and then the transformation process in Greenplum is triggered.

Technologies Used:

•	Airflow for task orchestration and scheduling.
•	boto3 for interacting with the S3 service (Yandex Cloud S3 in this case).
•	psycopg2 for executing SQL queries in the Greenplum database.
•	requests for fetching data from the API.

This ETL pipeline is modular, reusable, and built for scalability.

About

This project is an Airflow-based ETL (Extract, Transform, Load) pipeline that automates the process of fetching data from an external API, uploading it to an S3 bucket, and performing a data transformation using Greenplum.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages