features: Manually setting the validation set for multi-output task#1302
features: Manually setting the validation set for multi-output task#1302lizhuoq wants to merge 4 commits intomicrosoft:mainfrom
Conversation
|
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Pull request overview
This pull request adds support for manually setting a validation set for multi-output tasks when using the "holdout" evaluation method. Previously, users could not manually specify a validation set for multi-output regression tasks. The new multioutput_train_size parameter allows users to concatenate training and validation data and specify where to split them.
Changes:
- Added
multioutput_train_sizeparameter to AutoML class for manual validation set specification - Implemented
_train_val_splitmethod to split concatenated training/validation data - Added test case demonstrating the new functionality with MultiOutputRegressor
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| flaml/automl/automl.py | Added documentation and implementation for the multioutput_train_size parameter, including the split logic in the fit method |
| test/automl/test_regression.py | Added test_multioutput_train_size function to demonstrate usage of the new feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_multioutput_train_size(): | ||
| import numpy as np | ||
| from sklearn.datasets import make_regression | ||
| from sklearn.model_selection import train_test_split | ||
| from sklearn.multioutput import MultiOutputRegressor, RegressorChain | ||
|
|
||
| # create regression data | ||
| X, y = make_regression(n_targets=3) | ||
|
|
||
| # split into train and test data | ||
| X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42) | ||
| X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=42) | ||
|
|
||
| # train the model | ||
| model = MultiOutputRegressor( | ||
| AutoML(task="regression", time_budget=1, eval_method="holdout", multioutput_train_size=len(X_train)) | ||
| ) | ||
| model.fit(np.concatenate([X_train, X_val], axis=0), np.concatenate([y_train, y_val], axis=0)) | ||
|
|
||
| # predict | ||
| print(model.predict(X_test)) |
There was a problem hiding this comment.
The test function lacks assertions to verify the new multioutput_train_size feature works as expected. Consider adding assertions to validate that the model was trained successfully and that the validation split was performed correctly. For example, you could check that the model produces reasonable predictions or verify internal state that confirms the train/validation split occurred.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Why are these changes needed?
For original multi-output tasks where the eval_method is holdout, manual setting of the validation set was not possible. This commit introduces a new feature allowing manual setting of the validation set for multi-output tasks.
Related issue number
Checks