Ensemble stacking integration in chap_core#394
Draft
behdadnikkhah25 wants to merge 31 commits into
Draft
Conversation
…tt slik at andre modeller i chap kan fungere på ensemblet
…kter for meta modellen. (2) Lagt til --output-file som i eval i chap for ensemble modellen, og hvis man ikke legger til dette ved kjøring vil det bli laget en defualt fil uansett
…ge, bruker en annen metamodell og mer detaljert står i daily summary 09.04.26
…bytta metamodellen i cli_endpoints/ensemble.py til å bruke NoneNegativeMetaModel og ikke linear regresjon som før
…stisk, mangler cli integrasjon for å velge mellom disse, samt generalsjekk = om alt funker i forhold til forståelse og kode
…or integrasjon i chap master
…semble (4 stykker) og endringer i ensemble_model.py og ensemble.py og _common.py
…isk og deterministisk ensemblet (kun hvis Residual‑bootstrap, er aktivert
…at man har en config.yaml for hver modellLagt til 3 config for hver basemodell som brukes per sist kjøring
• chap_core/ensemble/ensemble_model.py ○ Håndhever at residual bootstrap kun er for deterministisk metode. ○ Typing‑justeringer (casts) for mypy uten å endre runtime. • chap_core/ensemble/_meta_models.py ○ Strammet typing ved nnls/minimize for å unngå Any‑retur. • chap_core/ensemble/_sample_extractor.py ○ Sikrere typing rundt DataFrame‑konvertering. • chap_core/ensemble/_predictor.py ○ (Ingen runtime‑endring gjort her i siste runde, kun tidligere RNG‑fix) • chap_core/cli_endpoints/ensemble.py ○ CLI‑flagget --use-residual-bootstrap er koblet riktig. ○ Typing‑justeringer for mypy (uten runtime‑endring). • chap_core/time_period/date_util_wrapper.py ○ pytz‑import merket for mypy uten runtime‑endring. • pyproject.toml ○ Ruff‑ og type‑sjekk‑unntak for legacy/run‑artefakter. ○ Mypy/pyright‑konfig justert for ren lint. Kjør en kontroll gjennom lint for å sjekke at koden er uten bugs og feil: • make lint make test (594 passed, 101 skipped, 5 xfailed, 1 xpassed) (passert, ingen bugs på 331 source files)
…å ABI, usikker på om dette er realtert til meg eller chap_core, må sjekkes
…vn og resultater, ikke bare resultater som var i forrige versjon som bruke chap_core sin save_results, har lagt den versjonen som kommentar for historikk
…babilistisk metamodell til vincentisering, bruk train_test_split for indre tids-splitting, og legg til en liten fiks i assessment/evaluation.py for en testfeil (ikke del av ensemble-implementasjonen); gjenstår endelig kommentarpuss og full testkjøring.
… forventningsrett CRPS-beregning, vincentization-basert kvantilkombinering og oppdatert meta-modell-fit) i tråd med reviewtilbakemeldinger.
…t to nye tester, test1: verifiserer at EnsembleModel.train bruker train_test_splitt på riktig TimePeriod, test:2 verifiserer at vincentization gir monotont ikke‑synkende samples per rad.
…r rå NNLS‑koeffisienter, bruker median ved sample‑kollaps, flytter config‑YAML til test‑fixtures, rydder wrappers og legger til/logger fallbacks/NaN‑dropp. Oppdaterer CLI‑håndtering og tilpasser evaluering/serialisering til nye forventninger. Tester oppdatert i tests/ensemble/test_ensemble_model.py, tests/ensemble/test_ensemble_stacking.py, tests/ensemble/test_legacy_wrappers.py, tests/ensemble/test_meta_models.py, tests/ensemble/test_predictor.py, tests/ensemble/test_sample_extractor.py og tests/evaluation/test_evaluation_serialization.py.
Stacking new
…e flag from EnsembleModel, EnsembleEstimator, and CLI endpoints, and update corresponding tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix deterministic stacking, NNLS weighting, and sample collapsing
Summary
This PR finishes the ensemble redesign and addresses the main review comments by:
It also:
wrappers.py,tests/fixtures/ensemble_config/,use_residual_bootstrapflag from models and CLI, and964 passed, 117 skipped, 4 xfailed, 1 xpassed).Key design changes
Deterministic ensemble = clean stacking (no bootstrap)
EnsembleModel/EnsembleEstimator, andsample_0),use_residual_bootstrapis removed from:EnsembleModel,EnsembleEstimator, andNNLS coefficients: prediction vs reporting
NonNegativeMetaModel.fitstores the raw NNLS solution incoef_.NonNegativeMetaModel.predictusesX @ coef_directly (no renormalization).EnsembleModel.train:coef_, clips to non-negative,self.weights.Probabilistic → deterministic collapse (median + warnings)
SampleExtractor.samples_to_flatdetectssample_*columns,Probabilistic meta-model: vincentization
ProbabilisticMetaModelnow combines base distributions via vincentization (quantile averaging):per row, each base’s samples approximate$Q_i(p)$ ,
samples are sorted and combined as
with non-negative simplex weights$w_i$ .
Implementation:
_vincentize_samples:(n_models, n_rows, n_samples),fit:crps_score_unbiased_matrixas objective,predict:_vincentize_samples,Resulting samples are sorted quantiles per row (monotone), not arbitrary Monte Carlo draws, and CRPS is evaluated with the same unbiased helper used by the metrics.
Observability: NaNs, fallbacks, CRPS semantics
self.weightsis set to uniform percentages.File overview
chap_core/ensemble/_meta_models.pyNonNegativeMetaModel:coef_used for prediction.ProbabilisticMetaModel:_vincentize_samples,crps_score_unbiased_matrix,crps_ensemble:crps_score_unbiased_matrix.chap_core/ensemble/ensemble_model.pytrain_test_splitonTimePeriod.SampleExtractor.reshape_samplesandProbabilisticMetaModel.SampleExtractor.samples_to_flatandNonNegativeMetaModel.use_residual_bootstrapremoved.chap_core/ensemble/_predictor.pyreshape_samples, callsmeta.predict, repacks with_pack_samples.NonNegativeMetaModel.predict,Samples(sample_0).chap_core/ensemble/_sample_extractor.pysamples_to_flat:forecast/valueif present,sample_*with median and logs a warning.reshape_samples:(location, time_period),target_nsamples as needed.chap_core/ensemble/wrappers.pyBaseModelSpec(template, config),TemplateWithConfigthat forwardsconfigtotemplate.get_model(config)and exposesname/repo.chap_core/cli_endpoints/ensemble.pyevaluate-ensemble:TemplateWithConfiglist from--base-model-namesand optional--model-configuration-yaml,EnsembleModel,Evaluation.createand writes NetCDF + CSV reports,ensemble_meta_report.csvwith base model weights (percentages).use_residual_bootstrapremoved from the CLI API.chap_core/cli_endpoints/_common.pysave_resultspatch to support the new evaluation outputs.Tests
All tests pass:
Key updated/added areas:
inner split, input validation, NaN logging.
deterministic stacking (single sample, weights reporting) and probabilistic sample shapes/monotonicity.
deterministic CRPS=MAE warning, probabilistic sample shapes.
NNLS behavior, simplex constraints, vincentization invariance, fallback.
median collapse and alignment/resampling in reshape_samples.
BaseModelSpec and TemplateWithConfig.
CLI wiring for deterministic/probabilistic ensemble evaluation.
ensemble configs moved into fixtures.
Example usage
1) Deterministic ensemble (point-only, no bootstrap)
2) Probabilistic ensemble
chap evaluate-ensemble \ --base-model-names https://github.com/chap-models/rwanda_sarimax,https://github.com/chap-models/chap_auto_ewars,https://github.com/chap-models/INLA_baseline_model \ --model-configuration-yaml ../tests/fixtures/ensemble_config/rwanda_sarimax_config.yaml,../tests/fixtures/ensemble_config/chap_ewars_monthly_config.yaml,../tests/fixtures/ensemble_config/inla_baseline_config.yaml \ --ensemble-method probabilistic \ --random-state 20 \ --dataset-csv https://raw.githubusercontent.com/dhis2/climate-health-data/main/lao/chap_LAO_admin1_monthly.csv \ --report-filename ensemble_report.csv \ --output-file ensemble.nc \ --backtest-params.n-splits 3 \ --backtest-params.n-periods 3Out of scope / deferred