The MahalanobisDistance class provides several default values for its parameters:
n_components=10
centering=True
alpha=0.001
However, the rationale behind these choices is not documented in the class docstring or in the implementation. While defaults can be helpful for quick prototyping, the absence of any justification (either empirical, theoretical, or from cited references) makes it difficult for users to assess whether these defaults are appropriate for their use cases.
Specifically:
- Why is 10 the default number of components? This may not be optimal or even adequate depending on the functional dataset's complexity.
- Why is centering assumed by default? Some applications (e.g., anomaly detection) may prefer uncentered data.
- The choice of
alpha=0.001 seems arbitrary—does it reflect a generally accepted value in the literature for regularization strength?
See _mahalanobis.py