Skip to content

Ensemble stacking integration in chap_core#394

Draft
behdadnikkhah25 wants to merge 31 commits into
dhis2-chap:masterfrom
behdadnikkhah25:master
Draft

Ensemble stacking integration in chap_core#394
behdadnikkhah25 wants to merge 31 commits into
dhis2-chap:masterfrom
behdadnikkhah25:master

Conversation

@behdadnikkhah25

Copy link
Copy Markdown

Fix deterministic stacking, NNLS weighting, and sample collapsing

Summary

This PR finishes the ensemble redesign and addresses the main review comments by:

  • making the deterministic ensemble a clean point-only stacking model (no residual bootstrap),
  • fixing NNLS usage so raw coefficients are used for prediction and only normalized for reporting,
  • making probabilistic → deterministic collapsing explicit (median + warnings),
  • documenting and testing the vincentization scheme for probabilistic ensembles, and
  • improving logging and tests around data issues and fallbacks.

It also:

  • removes legacy wrappers in favor of wrappers.py,
  • moves ensemble config YAMLs to tests/fixtures/ensemble_config/,
  • removes the unused use_residual_bootstrap flag from models and CLI, and
  • keeps all tests passing (964 passed, 117 skipped, 4 xfailed, 1 xpassed).

Key design changes

Deterministic ensemble = clean stacking (no bootstrap)

  • Residual bootstrap has been entirely removed from:
    • EnsembleModel / EnsembleEstimator, and
    • the ensemble CLI.
  • Deterministic ensembles now:
    • output a single sample per forecast (sample_0),
    • log a warning that CRPS reduces to MAE and is not directly comparable to multi-sample CRPS.
  • use_residual_bootstrap is removed from:
    • EnsembleModel,
    • EnsembleEstimator, and
    • CLI signatures and tests.

NNLS coefficients: prediction vs reporting

  • NonNegativeMetaModel.fit stores the raw NNLS solution in coef_.
  • NonNegativeMetaModel.predict uses X @ coef_ directly (no renormalization).
  • EnsembleModel.train:
    • reads coef_, clips to non-negative,
    • normalizes only for logging/reporting as percentages in self.weights.
  • This avoids the previous “optimize one objective, predict with another” behavior.

Probabilistic → deterministic collapse (median + warnings)

  • When probabilistic base models are used in the deterministic ensemble:
    • SampleExtractor.samples_to_flat detects sample_* columns,
    • logs a warning that uncertainty is discarded,
    • collapses to a point forecast via the median across samples per row.
  • This replaces the previous mean-based, silent collapse.

Probabilistic meta-model: vincentization

  • ProbabilisticMetaModel now combines base distributions via vincentization (quantile averaging):

    • per row, each base’s samples approximate $Q_i(p)$,

    • samples are sorted and combined as

      Q_{\text{ens}}(p) = \sum_i w_i Q_i(p),
      

      with non-negative simplex weights $w_i$.

  • Implementation:

    • _vincentize_samples:
      • validates shapes,
      • stacks samples as (n_models, n_rows, n_samples),
      • sorts along the sample axis and applies weights.
    • fit:
      • optimizes weights on the simplex with SLSQP using crps_score_unbiased_matrix as objective,
      • falls back to uniform weights with a warning if optimization fails.
    • predict:
      • applies _vincentize_samples,
      • clips negative ensemble values at zero.
  • Resulting samples are sorted quantiles per row (monotone), not arbitrary Monte Carlo draws, and CRPS is evaluated with the same unbiased helper used by the metrics.

Observability: NaNs, fallbacks, CRPS semantics

  • During meta-model training:
    • rows with NaNs in targets or features are dropped,
    • total dropped rows and per-base NaN counts are logged.
  • If NNLS yields non-positive weights:
    • a warning is logged,
    • self.weights is set to uniform percentages.
  • When deterministic ensembles output a single sample:
    • a warning explicitly states that CRPS = MAE.
  • When probabilistic samples are collapsed:
    • a warning states that probabilistic information is discarded.

File overview

chap_core/ensemble/_meta_models.py

  • NonNegativeMetaModel:
    • raw NNLS coef_ used for prediction.
  • ProbabilisticMetaModel:
    • vincentization via _vincentize_samples,
    • SLSQP optimization on the simplex with crps_score_unbiased_matrix,
    • uniform fallback on failure.
  • crps_ensemble:
    • delegates to crps_score_unbiased_matrix.

chap_core/ensemble/ensemble_model.py

  • Inner split uses train_test_split on TimePeriod.
  • Probabilistic mode:
    • uses SampleExtractor.reshape_samples and ProbabilisticMetaModel.
  • Deterministic mode:
    • uses SampleExtractor.samples_to_flat and NonNegativeMetaModel.
  • Logs:
    • NaN drops and per-base NaNs,
    • normalized meta-weights as percentages.
  • use_residual_bootstrap removed.

chap_core/ensemble/_predictor.py

  • Probabilistic:
    • aligns samples via reshape_samples, calls meta.predict, repacks with _pack_samples.
  • Deterministic:
    • builds meta-features, calls NonNegativeMetaModel.predict,
    • logs CRPS=MAE warning,
    • returns single-sample Samples (sample_0).

chap_core/ensemble/_sample_extractor.py

  • samples_to_flat:
    • uses forecast/value if present,
    • otherwise collapses sample_* with median and logs a warning.
  • reshape_samples:
    • aligns on (location, time_period),
    • tiles point forecasts when samples are missing, with warning,
    • resamples or tiles to target_n samples as needed.

chap_core/ensemble/wrappers.py

  • Replaces legacy wrappers with:
    • BaseModelSpec(template, config),
    • TemplateWithConfig that forwards config to template.get_model(config) and exposes name/repo.

chap_core/cli_endpoints/ensemble.py

  • CLI for evaluate-ensemble:
    • builds TemplateWithConfig list from --base-model-names and optional --model-configuration-yaml,
    • creates an EnsembleModel,
    • runs Evaluation.create and writes NetCDF + CSV reports,
    • writes ensemble_meta_report.csv with base model weights (percentages).
  • use_residual_bootstrap removed from the CLI API.

chap_core/cli_endpoints/_common.py

  • Shared CLI helpers, including a temporary save_results patch to support the new evaluation outputs.
  • Explicitly marked as temporary and to be removed once upstream provides a unified solution.

Tests

All tests pass:

  • Lint & type checking:
  uv run make lint
  # Ruff: all checks passed
  # Mypy: Success: no issues found in 382 source files
  
  • Test suite:
uv run make test
# 964 passed, 117 skipped, 4 xfailed, 1 xpassed, 7 warnings

Key updated/added areas:

  • tests/ensemble/test_ensemble_model.py:
    inner split, input validation, NaN logging.
  • tests/ensemble/test_ensemble_stacking.py:
    deterministic stacking (single sample, weights reporting) and probabilistic sample shapes/monotonicity.
  • tests/ensemble/test_predictor.py:
    deterministic CRPS=MAE warning, probabilistic sample shapes.
  • tests/ensemble/test_meta_models.py:
    NNLS behavior, simplex constraints, vincentization invariance, fallback.
  • tests/ensemble/test_sample_extractor.py:
    median collapse and alignment/resampling in reshape_samples.
  • tests/ensemble/test_legacy_wrappers.py:
    BaseModelSpec and TemplateWithConfig.
  • tests/cli_endpoints/test_ensemble_cli.py:
    CLI wiring for deterministic/probabilistic ensemble evaluation.
  • tests/fixtures/ensemble_config/*.yaml
    ensemble configs moved into fixtures.

Example usage

1) Deterministic ensemble (point-only, no bootstrap)

chap evaluate-ensemble \
  --base-model-names https://github.com/chap-models/rwanda_sarimax,https://github.com/chap-models/chap_auto_ewars,https://github.com/chap-models/INLA_baseline_model \
  --model-configuration-yaml ../tests/fixtures/ensemble_config/rwanda_sarimax_config.yaml,../tests/fixtures/ensemble_config/chap_ewars_monthly_config.yaml,../tests/fixtures/ensemble_config/inla_baseline_config.yaml \
  --ensemble-method deterministic \
  --random-state 20 \
  --dataset-csv https://raw.githubusercontent.com/dhis2/climate-health-data/main/lao/chap_LAO_admin1_monthly.csv \
  --report-filename ensemble_report.csv \       
  --output-file ensemble_report.nc \       
  --backtest-params.n-splits 3 \
  --backtest-params.n-periods 3

2) Probabilistic ensemble

chap evaluate-ensemble \                                                                              
  --base-model-names https://github.com/chap-models/rwanda_sarimax,https://github.com/chap-models/chap_auto_ewars,https://github.com/chap-models/INLA_baseline_model \
  --model-configuration-yaml ../tests/fixtures/ensemble_config/rwanda_sarimax_config.yaml,../tests/fixtures/ensemble_config/chap_ewars_monthly_config.yaml,../tests/fixtures/ensemble_config/inla_baseline_config.yaml \
  --ensemble-method probabilistic \
  --random-state 20 \
  --dataset-csv https://raw.githubusercontent.com/dhis2/climate-health-data/main/lao/chap_LAO_admin1_monthly.csv \
  --report-filename ensemble_report.csv \
  --output-file ensemble.nc \
  --backtest-params.n-splits 3 \
  --backtest-params.n-periods 3

Out of scope / deferred

  • Aligning evaluate-ensemble CLI UX fully with other commands (future PR).
  • Removing the temporary save_results patch in _common.py once upstream has a shared solution for report writing.

behdadnikkhah25 and others added 30 commits March 19, 2026 16:33
…tt slik at andre modeller i chap kan fungere på ensemblet
…kter for meta modellen. (2) Lagt til --output-file som i eval i chap for ensemble modellen, og hvis man ikke legger til dette ved kjøring vil det bli laget en defualt fil uansett
…ge, bruker en annen metamodell og mer detaljert står i daily summary 09.04.26
…bytta metamodellen i cli_endpoints/ensemble.py til å bruke NoneNegativeMetaModel og ikke linear regresjon som før
…stisk, mangler cli integrasjon for å velge mellom disse, samt generalsjekk = om alt funker i forhold til forståelse og kode
…semble (4 stykker) og endringer i ensemble_model.py og ensemble.py og _common.py
…isk og deterministisk ensemblet (kun hvis Residual‑bootstrap, er aktivert
…at man har en config.yaml for hver modellLagt til 3 config for hver basemodell som brukes per sist kjøring
	• chap_core/ensemble/ensemble_model.py
		○ Håndhever at residual bootstrap kun er for deterministisk metode.
		○ Typing‑justeringer (casts) for mypy uten å endre runtime.

	• chap_core/ensemble/_meta_models.py
		○ Strammet typing ved nnls/minimize for å unngå Any‑retur.

	• chap_core/ensemble/_sample_extractor.py
		○ Sikrere typing rundt DataFrame‑konvertering.

	• chap_core/ensemble/_predictor.py
		○ (Ingen runtime‑endring gjort her i siste runde, kun tidligere RNG‑fix)

	• chap_core/cli_endpoints/ensemble.py
		○ CLI‑flagget --use-residual-bootstrap er koblet riktig.
		○ Typing‑justeringer for mypy (uten runtime‑endring).

	• chap_core/time_period/date_util_wrapper.py
		○ pytz‑import merket for mypy uten runtime‑endring.

	• pyproject.toml
		○ Ruff‑ og type‑sjekk‑unntak for legacy/run‑artefakter.
		○ Mypy/pyright‑konfig justert for ren lint.

Kjør en kontroll gjennom lint for å sjekke at koden er uten bugs og feil:
	• make lint
make test  (594 passed, 101 skipped, 5 xfailed, 1 xpassed) (passert, ingen bugs på 331 source files)
…å ABI, usikker på om dette er realtert til meg eller chap_core, må sjekkes
…vn og resultater, ikke bare resultater som var i forrige versjon som bruke chap_core sin save_results, har lagt den versjonen som kommentar for historikk
…babilistisk metamodell til vincentisering, bruk train_test_split for indre tids-splitting, og legg til en liten fiks i assessment/evaluation.py for en testfeil (ikke del av ensemble-implementasjonen); gjenstår endelig kommentarpuss og full testkjøring.
… forventningsrett CRPS-beregning, vincentization-basert kvantilkombinering og oppdatert meta-modell-fit) i tråd med reviewtilbakemeldinger.
…t to nye tester, test1: verifiserer at EnsembleModel.train bruker train_test_splitt på riktig TimePeriod, test:2 verifiserer at vincentization gir monotont ikke‑synkende samples per rad.
…r rå NNLS‑koeffisienter, bruker median ved sample‑kollaps, flytter config‑YAML til test‑fixtures, rydder wrappers og legger til/logger fallbacks/NaN‑dropp. Oppdaterer CLI‑håndtering og tilpasser evaluering/serialisering til nye forventninger. Tester oppdatert i tests/ensemble/test_ensemble_model.py, tests/ensemble/test_ensemble_stacking.py, tests/ensemble/test_legacy_wrappers.py, tests/ensemble/test_meta_models.py, tests/ensemble/test_predictor.py, tests/ensemble/test_sample_extractor.py og tests/evaluation/test_evaluation_serialization.py.
…e flag from EnsembleModel, EnsembleEstimator, and CLI endpoints, and update corresponding tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants