This repository contains an interactive web application for performing detailed climate trend analysis on county-level data for the United States. The initial implementation focuses on the Standardized Precipitation Index (SPI).
The project is architected as a fully self-contained, automated system. It integrates a data pipeline that automatically fetches, processes, and updates the application's data on a monthly schedule using GitHub Actions.
The live, interactive dashboard is deployed on Streamlit Cloud and is available at the following URL: https://climatedasboard-m8jnmrjrd6ltxhgnbet8x3.streamlit.app/
- Interactive County Selection: Users can select any US County by its FIPS code from a searchable dropdown menu to load its specific data.
- Comprehensive Analysis Suite: The application provides a suite of standard time-series analyses for the selected county:
- Trend Analysis: Visualization of long-term trends using a 12-month rolling average.
- Anomaly Detection: Identification of statistical anomalies (defined as >2 standard deviations from the rolling mean).
- Seasonal Decomposition: Decomposition of the time series into observed, trend, seasonal, and residual components.
- Autocorrelation Analysis: Generation of ACF and PACF plots to inspect the data's correlation structure.
- Forecasting: Predictive 24-month forecasting using an ARIMA model.
- Fully Automated: The underlying dataset is automatically updated monthly via a GitHub Actions workflow.
This project is designed as a single, self-sustaining repository ("monorepo") that handles both the data pipeline and the user-facing application. It utilizes Git LFS to manage large data files and GitHub Actions for full automation.
The workflow is as follows:
- Scheduled Trigger: A GitHub Actions workflow is scheduled to run on the first day of every month.
- Data Pipeline Execution: The workflow executes a series of scripts within a cloud-based runner:
download_script.py: Fetches the latest raw data from the source (CDC).parse_precipitation_index.py: Cleans and processes the raw data into a standardized format.- The processed data is then converted into the efficient Parquet format (
spi_data.parquet).
- Data Versioning and Update: The workflow commits the new Parquet data file back to the repository. Git LFS handles the storage of this large file.
- Continuous Deployment: Streamlit Cloud detects the new commit in the repository and automatically redeploys the application, making the fresh data immediately available to users.
climate_dasboard/
├── .github/
│ └── workflows/
│ └── update_data.yml # The instruction manual for the GitHub Actions automation
├── .gitattributes # Configures which files are handled by Git LFS
├── app.py # The main Streamlit application code
├── download_script.py # Pipeline script to download raw data
├── parse_precipitation_index.py # Pipeline script to clean raw data
├── import_configs.json # Configuration for the pipeline
├── index/ # Contains the raw input data for the pipeline (managed by LFS)
├── spi_data.parquet # The final, clean data file used by the app (managed by LFS)
└── requirements.txt # A list of all Python libraries required
To run this application on a local machine, follow these steps.
- Git
- Git LFS (
sudo apt-get install git-lfs) - Python 3.8+ and
pip
-
Clone the repository:
git clone https://github.com/vishalworkdatacommon/climate_dasboard.git cd climate_dasboard -
Pull LFS data: Download the large data files tracked by Git LFS.
git lfs pull
-
Set up a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
Launch the Streamlit application with the following command:
streamlit run app.pyThe application will open in your default web browser.