StreamingGradientBoostedRegression#
- class capymoa.regressor.StreamingGradientBoostedRegression[source]#
Bases:
MOAClassifier
Streaming Gradient Boosted Regression.
Streaming Gradient Boosted Regression (SGBR) [1], was developed to adapt gradient boosting for streaming regression using Streaming Gradient Boosted Trees (SGBT). A variant called SGB(Oza), which uses OzaBag bagging regressors as base learners, outperforms existing state-of-the-art methods in both accuracy and efficiency across various drift scenarios.
>>> from capymoa.datasets import Fried >>> from capymoa.regressor import StreamingGradientBoostedRegression >>> from capymoa.evaluation import prequential_evaluation >>> stream = Fried() >>> schema = stream.get_schema() >>> learner = StreamingGradientBoostedRegression(schema) >>> results = prequential_evaluation(stream, learner, max_instances=1000) >>> round(results["cumulative"].r2(), 2) 0.61
- __init__(
- schema: Schema | None = None,
- random_seed: int = 0,
- base_learner='meta.OzaBag -s 10 -l (trees.FIMTDD -s VarianceReductionSplitCriterion -g 50 -c 0.01 -e)',
- boosting_iterations: int = 10,
- percentage_of_features: int = 75,
- learning_rate=1.0,
- disable_one_hot: bool = False,
- multiply_hessian_by: int = 1,
- skip_training: int = 1,
- use_squared_loss: bool = False,
Streaming Gradient Boosted Regression (SGBR) Regressor
- Parameters:
schema – The schema of the stream.
random_seed – The random seed passed to the MOA learner.
base_learner – The base learner to be trained. Default meta.OzaBag -s 10 -l (trees.FIMTDD -s VarianceReductionSplitCriterion -g 50 -c 0.01 -e).
boosting_iterations – The number of boosting iterations. Default 10.
percentage_of_features – The percentage of features to use.
learning_rate – The learning rate. Default 1.0.
disable_one_hot – Whether to disable one-hot encoding for regressors that supports nominal attributes.
multiply_hessian_by – The multiply hessian by this parameter to generate weights for multiple iterations.
skip_training – Skip training of 1/skip_training instances. skip_training=1 means no skipping is performed (train on all instances).
use_squared_loss – Whether to use squared loss for classification.
- predict(instance)[source]#
Predict the label of an instance.
The base implementation calls
predict_proba()
and returns the label with the highest probability.- Parameters:
instance – The instance to predict the label for.
- Returns:
The predicted label or
None
if the classifier is unable to make a prediction.
- predict_proba(instance)[source]#
Return probability estimates for each label.
- Parameters:
instance – The instance to estimate the probabilities for.
- Returns:
An array of probabilities for each label or
None
if the classifier is unable to make a prediction.
- train(instance)[source]#
Train the classifier with a labeled instance.
- Parameters:
instance – The labeled instance to train the classifier with.
- random_seed: int#
The random seed for reproducibility.
When implementing a classifier ensure random number generators are seeded.