Skip to content

[BUG] Update GLM Normal and Gamma distribution parameter calculations to use self.scale_#718

Merged
fkiraly merged 2 commits intosktime:mainfrom
amaydixit11:glm_gamma_dispersion
Feb 17, 2026
Merged

[BUG] Update GLM Normal and Gamma distribution parameter calculations to use self.scale_#718
fkiraly merged 2 commits intosktime:mainfrom
amaydixit11:glm_gamma_dispersion

Conversation

@amaydixit11
Copy link
Copy Markdown
Contributor

Reference Issues/PRs

Fixes #717

What does this implement/fix? Explain your changes.

This PR fixes the construction of predictive distributions in GLMRegressor for the Normal and Gamma families.

Previously, the implementation used mean_se from get_prediction().summary_frame() to derive sigma (Normal) and alpha/beta (Gamma). However, mean_se represents the standard error of the estimated mean, not the conditional variance of the response.

This PR updates the implementation to use the fitted dispersion parameter (scale_) instead, following standard GLM theory:

  • For Normal: sigma = sqrt(scale_)
  • For Gamma: alpha = 1 / scale, beta = 1 / (scale * mu)

This ensures that predictive distributions model observation noise rather than estimation uncertainty.

Does your contribution introduce a new dependency? If yes, which one?

No, this change does not introduce any new dependencies.

What should a reviewer concentrate their feedback on?

  • Correctness of the updated Normal and Gamma parameterization.
  • Consistency with GLM variance and dispersion theory.
  • Compatibility with existing skpro distribution interfaces.

Did you add any tests for the change?

No new tests were added in this PR.

I would be happy to add tests for probabilistic calibration if requested.

Any other comments?

This change aligns the probabilistic output of GLMRegressor with the theoretical definition of GLMs and the documentation of statsmodels.

Feedback and suggestions are welcome.

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG].
For new estimators
  • I've added the estimator to the API reference.
  • I've added one or more illustrative usage examples to the docstring.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured dependency isolation.

Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain how you infer that using self.scale_ is the right thing to do?
This seems to be a scalar, rather than a value that gets predicted for each point in the prediction - and, in particular, a scalar that is fitted in fit, rather than coming out of predict.

So, it seems wrong? Can you please explain?

@amaydixit11
Copy link
Copy Markdown
Contributor Author

amaydixit11 commented Feb 14, 2026

Hi @fkiraly
In standard GLMs, the dispersion parameter φ or scale is estimated globally during fit by design. In statsmodels, GLMResults.scale is documented as a float and not an array, showing the assumption of constant dispersion

and under the GLM formulation, Var(Y|X) = φ * V(μ)

For the families used here:

  • Gamma: Var(Y|X) = φ * μ^2, here the μ varies per observation, so the variance still varies per point
  • Normal: Var(Y|X) = φ, here the variance is constant which is as expected for Gaussian models

So even though scale is scalar, the predictive variance is not necessarily constant.
The current implementation uses mean_se, which is the standard error of the estimated mean and shrinks with sample size (proportional to 1/√N), This represents estimation uncertainty and not the conditional variance of Y, so using it would eventually lead to increasingly narrow predictive distributions as more data is added

But self.scale_ estimates the dispersion of the data and is the quantity used by statsmodels to model Var(Y|X), Using it is therefore consistent with both GLM theory and the statsmodels implementation when constructing P(Y|X)..

Let me know if i'm wrong somewhere or if any changes are needed, this is something that i discovered while working with the gamma distribution...

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Feb 15, 2026

Oh ok, can I confirm what you are saying:

  • the interfaced statsmodels estimator, GLM assumes a constant scale parameter, hence a float (a reference for this would be appreciated, e.g., link/section pointer)
  • this scale parameter is estimated on fit and stored in self.scale_ of the statsmodels estimator
  • mean_se is not a predictive variance but a confidence interval on the predictive mean, hence not appropriate for the conditional distribution prediction that we need to make

Please correct me if you think I am wrong.

@fkiraly fkiraly added bug and removed enhancement labels Feb 15, 2026
@amaydixit11
Copy link
Copy Markdown
Contributor Author

amaydixit11 commented Feb 15, 2026

Yes, that's correct
Here are the relevant references:
for Constant scale
The GLM technical documentation describes scale as the dispersion parameter of the exponential dispersion model, and GLMResults.scale is documented as a float:

@amaydixit11
Copy link
Copy Markdown
Contributor Author

about mean_se not being predictive variance, this would follow directly from the statsmodels documentation,

get_prediction

In get_prediction, the docs are stating this

'mean' returns the conditional expectation of endog E(y | x)

and

'var_unscaled' variance of endog implied by the likelihood model. This does not include scale

https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLMResults.get_prediction.html

They also say that prediction results are used to construct:

confidence intervals … for the prediction of the mean

which indicates that the associated standard errors (including mean_se in summary_frame) are for inference on the estimated mean, not for modeling Var(Y|X).

Gamma family

Separately, for the Gamma family, the variance function is documented as:

variance is an instance of … mu_squared

www.statsmodels.org/stable/generated/statsmodels.genmod.families.family.Gamma.html

i.e., V(μ) = μ².

this is implying that the conditional variance of Y follows Var(Y|X) = scale · V(μ)

while mean_se reflects uncertainty in the estimate of E[Y|X] and is used for confidence intervals on the mean,

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Feb 15, 2026

Yes, that's correct Here are the relevant references: for Constant scale The GLM technical documentation describes scale as the dispersion parameter of the exponential dispersion model, and GLMResults.scale is documented as a float:

* https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLMResults.html#statsmodels.genmod.generalized_linear_model.GLMResults.scale
  This reflects the standard GLM assumption of global (homoskedastic) dispersion.

Hm, this reference only tells me there is a scalar scale parameter coming out of fit. Is there documentation anywhere about the predictive distribution?

Though I think what you are saying is very plausible. It is just very unfortunate that statsmodels does not seem to document their model assumptions clearly, especially when it comes to predictive usage.

@amaydixit11
Copy link
Copy Markdown
Contributor Author

Yea, it's true that statsmodel don't have this specific part documented,

However, we can verify the correctness of the theoretical GLM assumption (Var(Y|X) = scale * V(μ)) versus the estimation uncertainty (mean_se) empirically.

I ran a simulation with a known Gamma distribution (True Variance = 50.0) and increasing sample size N.

You can check it out here:
https://colab.research.google.com/drive/17y1O-Rx6YoHePOBzPt3rNZcxyt8nFCak?usp=sharing

If scale is correct, the estimated variance should converge to 50.0.
If mean_se were correct, it should also converge to 50.0.
Results:

N=100:   Var(scale)=37.6   Var(mean_se)=56.5   True=50.0
N=1000:  Var(scale)=49.2   Var(mean_se)=23.5   True=50.0
N=50000: Var(scale)=50.2   Var(mean_se)= 3.2   True=50.0

This should confirm that using mean_se for the predictive distribution is incorrect, as it would lead to arbitrarily narrow confident intervals for large datasets. scale allows us to recover the true data dispersion.

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Feb 17, 2026

ok, that is a very strong argument.

You made a mistake in that the predictive variance would be mean_se-squared rather than just mean_se, but no matter, it is getting smaller with roughly sqrt(N), which confirms it is a measure of confidence rather than an estimate of predictive variance.

This does not defeat the argument though.

@fkiraly fkiraly merged commit f61a210 into sktime:main Feb 17, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug module:regression probabilistic regression module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] GLMRegressor incorrectly maps dispersion parameters for Gamma and Normal families

2 participants