Contact Center Forecasting — NeuralNomads (Team 064)

Top-7 finish in a multi-team contact center forecasting competition. The task: predict call volume, average handle time (CCT), and abandon rate at 30-minute interval granularity across four call queues for every day in August 2025.

This repo has the full pipeline, EDA analysis, and the reasoning behind every design decision — not just the code that worked, but why it works.

The Problem

Contact centers hire agents on schedules set days or weeks in advance. Get the forecast wrong and you're either burning budget on idle agents or watching customers hang up unanswered. Both hurt — but understaffing costs more in real-world penalties, which shaped how we built the model.

What makes this hard isn't forecasting the daily total. That's manageable. The hard part is accurately distributing those calls across 48 half-hour windows per day, separately for 4 queues, for 31 straight days. That's 5,952 individual predictions, each with three metrics attached.

Results

Metric	Score
Competition Rank	#7
Composite Score	15.42
Volume SMAPE (interval-level)	34.14%
CCT Error	14.03%
Abandon Rate Error	1.31%
Workload Penalty	0.130

The Core Idea

Most teams approach this as a time-series forecasting problem — predict each interval autoregressively using lag features. We tried that first. It scored above 140 on the leaderboard because errors compound: each predicted interval feeds into the next, and you're 48 slots deep before the end of the day.

The insight that changed everything: we already know the daily totals for August. Or rather, the task of forecasting a daily total is a much more stable problem than interval-level spikes.

Our architecture uses a hybrid approach:

Daily Forecasting: A dedicated XGBoost model predicts the total volume for each day.
Intraday Distribution: A statistical "shape" (learned from historical data) distributes those daily totals across 48 intervals.

interval_calls = XGBoost_Daily_Total × shape(queue, day_of_week, slot) × bias

This hybrid design eliminates autoregressive error propagation while benefiting from modern gradient boosting for the top-level trend.

What Actually Moved the Leaderboard

We went from rank ~25 to rank 7 in a single change: fixing null values in the training data.

The interval dataset had 90–282 null Call_Volume and CCT rows per queue. Dropping them (which is what you'd do by default) silently biases the shape — the missing rows weren't random, they clustered in specific time slots, making those slots look artificially quiet. The shape would then under-allocate calls to exactly the intervals that needed the most.

The fix was a two-tier median imputation: match on (queue, day-of-week, month, interval slot) first, fall back to (queue, day-of-week, interval slot) across any month if that's empty. After imputation, zero nulls entered the shape calculation.

Before imputation: rank 25, Volume SMAPE 35.4%, Composite 16.00
After imputation: rank 7, Volume SMAPE 34.1%, Composite 15.42

No model change. No new features. Just clean data.

Project Structure

.
├── main.py                    # entry point
├── requirements.txt
│
├── src/                       # pipeline modules
│   ├── config.py              # all constants (BIAS, holidays, tuning params)
│   ├── data_loader.py         # read Excel sheets, normalise interval format
│   ├── model.py               # XGBoost daily forecasting + Grid Search
│   ├── utils.py               # trimmed_mean, smape, cyclic encoding, impute_nulls
│   ├── shape.py               # build intraday shape (statistical baseline)
│   ├── forecast.py            # apply shape to XGBoost predicted daily totals
│   └── validate.py            # cross-check aggregated predictions vs actuals
│
├── analysis/                  # EDA — every assumption in config.py has a plot here
│   ├── run_eda.py             # run all analyses in one go
│   ├── holidays.py            # why 6 specific dates are excluded
│   ├── intraday_shape.py      # shape stability, DOW differences, smoothing effect
│   ├── cct_patterns.py        # CCT by interval, alpha sweep, threshold justification
│   ├── bias_scoring.py        # asymmetric penalty simulation → justifies BIAS=1.044
│   ├── data_quality.py        # null distribution, imputation quality check
│   └── trends.py              # volume trends, DOW heatmap, metric correlations
│
└── plots/
    └── eda/                   # 17 saved charts from run_eda.py

Setup

You'll need Python 3.10+ and the data file (data.xlsx) in the project root. The data isn't included here for obvious reasons, but the pipeline structure is fully reproducible if you have contact center interval data in the same format.

# clone and create environment
git clone https://github.com/mavito/NeuralNomads.git
cd NeuralNomads
python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate

# install dependencies
pip install -r requirements.txt

# drop your data.xlsx in the project root, then run
python main.py

The output is submission.csv — 1,488 rows (31 days × 48 slots) in wide format with columns for each queue's calls, abandoned calls, abandon rate, and CCT.

Running the EDA

python -m analysis.run_eda

This runs all seven analysis modules and saves 17 charts to plots/eda/. Each chart maps directly to a parameter in src/config.py — the idea being that nothing in the pipeline is a magic number; everything has a visualisation that explains it.

Chart	Justifies
`holiday_volume_impact.png`	`EXCLUDE_DATES` in config.py — Easter is −49%, Memorial Day −50%
`shape_stability_across_months.png`	Pooling Apr/May/Jun for shape calculation
`shape_by_dow.png` + `dow_volume_heatmap.png`	Separate shape per day-of-week
`smoothing_effect.png`	`SMOOTH_KERNEL` and `SMOOTH_ALPHA` values
`cct_variance_vs_volume.png`	`CCT_THRESHOLD = 15`
`cct_stability_months.png`	`CCT_BLEND_ALPHA = 0.9`
`asymmetric_penalty_bias.png`	`BIAS = 1.044`
`null_distribution.png` + `null_by_slot_heatmap.png`	The imputation approach in `utils.py`
`abandon_rate_stability.png`	`ABD_ALPHA = 1.0` (no blending needed)

Key Design Decisions

1. Ratio-of-sums, not ratio-of-means

The shape is computed as:

shape[slot] = Σ(calls in slot across all training days) / Σ(total daily calls)

rather than the mean of per-day fractions. High-volume days contribute proportionally more — which is correct, because busier days are more representative of the "true" shape than quiet days following a holiday.

2. Circular kernel smoothing

Raw ratio-of-sums shapes have slot-to-slot noise from small sample sizes (only ~60 Tuesdays across April-June). We smooth with a 5-element circular kernel [0.10, 0.20, 0.40, 0.20, 0.10], treating the 48 slots as a ring so midnight wraps correctly. The smoothed shape is blended 50/50 with the raw shape (SMOOTH_ALPHA=0.5).

3. XGBoost Daily Forecasting with Grid Search

Instead of relying on simple averages, we use a dedicated XGBoost Regressor to predict the daily call volume for each queue.

Feature Engineering: We utilize cyclic time encoding (sin/cos for DOW/Month) and seasonal lags (1, 7, 14, 28, and 364 days) to capture both short-term momentum and yearly cycles.
Automated Tuning: The pipeline includes an automated Grid Search step. It uses July 2025 as a validation set to find the optimal max_depth, learning_rate, and n_estimators for each specific queue before generating the August forecast.

4. CCT blending, not direct prediction

For intervals with ≥15 predicted calls:

CCT_pred = 0.9 × historical_interval_CCT + 0.1 × august_daily_CCT

For quieter intervals, the historical average is too noisy (CV > 25%), so we fall back to the flat daily average. The 0.9 weight was chosen by sweeping alpha from 0 to 1 on daily-level SMAPE — see analysis/cct_patterns.py.

5. Deliberate upward bias

The competition penalises understaffing more than overstaffing. We factored this into the forecast directly: BIAS=1.044 means every volume prediction is ~4.4% above the expected value. The analysis/bias_scoring.py module simulates the asymmetric cost function and shows the optimal bias sits at 4–5% above 1.0.

Data Format

The expected Excel file has one sheet per queue per data type:

Sheet name	Contents
`A - Interval`, `B - Interval`, ...	30-min interval rows for Apr–Jun 2025: `Day`, `Month`, `Interval`, `Call_Volume`, `CCT`, `Abandoned_Rate`
`A - Daily`, `B - Daily`, ...	Daily rows Jan 2024–Aug 2025: `Date`, `Call_Volume`, `CCT`, `Abandon_Rate`
`Daily Staffing`	Headcount by date (loaded but not used in final model)

Adapting This to Your Own Data

The model is domain-agnostic — if you have any system where:

You know the daily total for the forecast period
You have historical 30-min or hourly data to learn a shape from
There's a meaningful intraday distribution worth capturing

...then this approach applies. Change the sheet names in src/data_loader.py, update EXCLUDE_DATES in src/config.py to match your relevant holidays, and point the model at your data.

The one assumption that won't generalise without modification is the specific Excel structure. Everything downstream of load_data() is format-agnostic.

Dependencies

Package	Role
`pandas`	Data manipulation throughout
`numpy`	Shape arithmetic, smoothing
`xgboost`	Core daily volume forecasting model
`scikit-learn`	Hyperparameter Grid Search & metrics
`matplotlib` + `seaborn`	EDA visualisations
`openpyxl`	Reading the Excel data file
`python-pptx`	Presentation generation

Full pinned versions in requirements.txt.

What We Learned

Data quality is the highest-leverage thing you can do. Fixing the nulls — 90 to 282 rows per queue — moved us from rank 25 to rank 7. No model change has ever come close to that impact.

Domain knowledge matters more than model complexity. The decision to anchor predictions on known daily totals came from understanding the problem structure, not from trying more algorithms. It reduced the search space dramatically.

Match your loss function to the actual cost. We were scoring on an asymmetric penalty where understaffing hurt more than overstaffing. Optimising for symmetric SMAPE would have given us a worse real-world result. The 4.4% upward bias was deliberate, not an artifact.

Use your real evaluation metric as ground truth. Several internal changes (CCT ratio scaling, recency weighting) looked good on our cross-check SMAPE. They didn't move the leaderboard. We learned to trust the leaderboard as the definitive signal and discard anything that didn't show up there.

Built for the NeuralNomads team as part of a contact center analytics competition — March 2026.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
analysis		analysis
forecasts		forecasts
plots/eda		plots/eda
src		src
LICENSE		LICENSE
README.md		README.md
best_forecast_v12.csv		best_forecast_v12.csv
data.xlsx		data.xlsx
main.py		main.py
requirements.txt		requirements.txt
test.csv		test.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contact Center Forecasting — NeuralNomads (Team 064)

The Problem

Results

The Core Idea

What Actually Moved the Leaderboard

Project Structure

Setup

Running the EDA

Key Design Decisions

1. Ratio-of-sums, not ratio-of-means

2. Circular kernel smoothing

3. XGBoost Daily Forecasting with Grid Search

4. CCT blending, not direct prediction

5. Deliberate upward bias

Data Format

Adapting This to Your Own Data

Dependencies

What We Learned

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Contact Center Forecasting — NeuralNomads (Team 064)

The Problem

Results

The Core Idea

What Actually Moved the Leaderboard

Project Structure

Setup

Running the EDA

Key Design Decisions

1. Ratio-of-sums, not ratio-of-means

2. Circular kernel smoothing

3. XGBoost Daily Forecasting with Grid Search

4. CCT blending, not direct prediction

5. Deliberate upward bias

Data Format

Adapting This to Your Own Data

Dependencies

What We Learned

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages