feat: add K-trajectory uncertainty propagation to ExtendedPredictor#247
feat: add K-trajectory uncertainty propagation to ExtendedPredictor#247audunlas wants to merge 2 commits into
Conversation
Add n_trajectories parameter to ExtendedPredictor enabling proper uncertainty growth over multi-step prediction horizons. The first trajectory always uses the mean (default, no extra cost). Additional trajectories draw a random sample for history updates, allowing uncertainty to compound across steps. Expose n_trajectories as --n-trajectories CLI argument on chap eval. Also declare prediction lengths in naive_python_model_uv MLproject so the feature tutorial correctly triggers ExtendedPredictor.
There was a problem hiding this comment.
Pull request overview
Adds support for propagating uncertainty across multi-step forecasts by running multiple “trajectories” in ExtendedPredictor and exposing the capability via chap eval --n-trajectories.
Changes:
- Extend
ExtendedPredictorto runn_trajectoriesiterative forecast rollouts and combine their samples into a single output. - Add
--n-trajectoriestochap evalto control multi-trajectory uncertainty propagation (default 1 preserves prior behavior). - Add/adjust tests and documentation describing the new behavior and CLI usage.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
chap_core/external/ExtendedPredictor.py |
Implements multi-trajectory rollout and sampled-vs-mean history updates. |
chap_core/cli_endpoints/evaluate.py |
Adds n_trajectories CLI parameter and passes it when wrapping with ExtendedPredictor. |
tests/external/test_extended_predictor.py |
Adds a test asserting sample column count scales with n_trajectories. |
docs/feature_tutorials/extended_predictor.md |
Documents uncertainty growth behavior and CLI usage example. |
external_models/naive_python_model_uv/MLproject |
Adds min/max prediction length metadata to the uv example model. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| class ExtendedPredictor(ConfiguredModel): | ||
| def __init__(self, configured_model: ConfiguredModel, desired_scope): | ||
| def __init__(self, configured_model: ConfiguredModel, desired_scope, n_trajectories: int = 1): |
There was a problem hiding this comment.
n_trajectories is not validated. If a user passes 0 or a negative value, trajectory_dfs becomes empty and trajectory_dfs[0] will raise IndexError. Consider validating in __init__ (or at the start of predict) that n_trajectories >= 1 and raising a ValueError with a clear message.
| def __init__(self, configured_model: ConfiguredModel, desired_scope, n_trajectories: int = 1): | |
| def __init__(self, configured_model: ConfiguredModel, desired_scope, n_trajectories: int = 1): | |
| if n_trajectories < 1: | |
| raise ValueError(f"n_trajectories must be at least 1, got {n_trajectories}") |
| new_rows["disease_cases"] = new_rows[sample_cols].mean(axis=1) | ||
| else: | ||
| rng = np.random.default_rng() | ||
| chosen_col = rng.choice(sample_cols) | ||
| new_rows["disease_cases"] = new_rows[chosen_col] |
There was a problem hiding this comment.
Creating a new np.random.default_rng() inside update_historic_data makes multi-trajectory runs non-reproducible and also ignores any global seeding done via np.random.seed. Consider instantiating the RNG once on the predictor (optionally with a seed) and reusing it, so results can be reproduced and tests can deterministically control the sampling.
| result = trajectory_dfs[0] | ||
| for traj_df in trajectory_dfs[1:]: | ||
| result = result.merge(traj_df, on=["time_period", "location"]) | ||
|
|
There was a problem hiding this comment.
Building the multi-trajectory result via repeated merge will carry along duplicate non-sample columns (e.g., covariates returned by the model) and can get expensive for larger n_trajectories. Consider merging/joining only the renamed sample_* columns onto the base trajectory (or using an index + pd.concat/join) to avoid extra columns and reduce runtime/memory overhead.
| result = trajectory_dfs[0] | |
| for traj_df in trajectory_dfs[1:]: | |
| result = result.merge(traj_df, on=["time_period", "location"]) | |
| # Use the first trajectory as the base, keeping all of its columns. | |
| # For subsequent trajectories, only join the (renamed) sample_* columns to avoid | |
| # repeatedly carrying along any non-sample columns (e.g., covariates). | |
| base_df = trajectory_dfs[0].set_index(["time_period", "location"]) | |
| for traj_df in trajectory_dfs[1:]: | |
| traj_df_indexed = traj_df.set_index(["time_period", "location"]) | |
| sample_cols = [col for col in traj_df_indexed.columns if col.startswith("sample_")] | |
| if sample_cols: | |
| base_df = base_df.join(traj_df_indexed[sample_cols], how="inner") | |
| result = base_df.reset_index() |
| int, | ||
| Parameter( | ||
| help="Number of trajectories for multi-step uncertainty propagation when ExtendedPredictor is used. " | ||
| "The first trajectory always uses the mean (k=1, default, no extra cost). " |
There was a problem hiding this comment.
The help text says "(k=1, default, ...)" but the flag/parameter is --n-trajectories/n_trajectories. Using the same term in the help (e.g., "(n_trajectories=1, default, ...)" or "(n=1, default, ...)") would avoid confusion for CLI users.
| "The first trajectory always uses the mean (k=1, default, no extra cost). " | |
| "The first trajectory always uses the mean (n_trajectories=1, default, no extra cost). " |
| def test_n_trajectories_doubles_sample_columns(): | ||
| """Test that n_trajectories=2 produces 2x sample columns with the same (time_period, location) pairs.""" | ||
| data = create_multi_location_data(num_locations=2, num_future_periods=6) | ||
|
|
||
| mock_model_1 = MockModel(min_pred_length=2, max_pred_length=3) | ||
| mock_model_2 = MockModel(min_pred_length=2, max_pred_length=3) | ||
|
|
||
| predictor_1 = ExtendedPredictor(mock_model_1, desired_scope=6, n_trajectories=1) | ||
| predictor_2 = ExtendedPredictor(mock_model_2, desired_scope=6, n_trajectories=2) | ||
|
|
||
| result_1 = predictor_1.predict(data["historic_data"], data["future_data"]).to_pandas() | ||
| result_2 = predictor_2.predict(data["historic_data"], data["future_data"]).to_pandas() | ||
|
|
||
| sample_cols_1 = [c for c in result_1.columns if c.startswith("sample_")] | ||
| sample_cols_2 = [c for c in result_2.columns if c.startswith("sample_")] | ||
|
|
||
| assert len(sample_cols_2) == 2 * len(sample_cols_1) | ||
|
|
||
| # Same (time_period, location) pairs in both | ||
| pairs_1 = set(zip(result_1["time_period"], result_1["location"])) | ||
| pairs_2 = set(zip(result_2["time_period"], result_2["location"])) | ||
| assert pairs_1 == pairs_2 |
There was a problem hiding this comment.
The new test only checks that n_trajectories=2 doubles the number of sample_* columns, but it doesn't verify the new behavior that additional trajectories update history using a sampled value (vs the mean). Consider adding a deterministic test (e.g., by stubbing the RNG choice or injecting a seeded RNG) that asserts the second iteration's historic disease_cases uses the selected sample_* column when n_trajectories>1.
- Validate n_trajectories >= 1 in __init__ - Instantiate RNG once on predictor instance - Join only sample columns from subsequent trajectories to avoid duplicate non-sample columns - Fix help text to use n_trajectories=1 instead of k=1 - Add test for invalid n_trajectories
Adds K-trajectory uncertainty propagation to ExtendedPredictor so uncertainty grows over longer horizons instead of collapsing to the mean. Integrated with chap eval via --n-trajectories (default 1, same behavior as before). Feel free to reject if you don't think this is worth including.