StreamingGradientBoostedRegression#

class capymoa.regressor.StreamingGradientBoostedRegression[source]#

Bases: MOAClassifier

Streaming Gradient Boosted Regression.

Streaming Gradient Boosted Regression (SGBR) [1], was developed to adapt gradient boosting for streaming regression using Streaming Gradient Boosted Trees (SGBT). A variant called SGB(Oza), which uses OzaBag bagging regressors as base learners, outperforms existing state-of-the-art methods in both accuracy and efficiency across various drift scenarios.

>>> from capymoa.datasets import Fried
    >>> from capymoa.regressor import StreamingGradientBoostedRegression
    >>> from capymoa.evaluation import prequential_evaluation
>>> stream = Fried()
>>> schema = stream.get_schema()
>>> learner = StreamingGradientBoostedRegression(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> round(results["cumulative"].r2(), 2)
0.61
__init__(
schema: Schema | None = None,
random_seed: int = 0,
base_learner='meta.OzaBag -s 10 -l (trees.FIMTDD -s VarianceReductionSplitCriterion -g 50 -c 0.01 -e)',
boosting_iterations: int = 10,
percentage_of_features: int = 75,
learning_rate=1.0,
disable_one_hot: bool = False,
multiply_hessian_by: int = 1,
skip_training: int = 1,
use_squared_loss: bool = False,
)[source]#

Streaming Gradient Boosted Regression (SGBR) Regressor

Parameters:
  • schema – The schema of the stream.

  • random_seed – The random seed passed to the MOA learner.

  • base_learner – The base learner to be trained. Default meta.OzaBag -s 10 -l (trees.FIMTDD -s VarianceReductionSplitCriterion -g 50 -c 0.01 -e).

  • boosting_iterations – The number of boosting iterations. Default 10.

  • percentage_of_features – The percentage of features to use.

  • learning_rate – The learning rate. Default 1.0.

  • disable_one_hot – Whether to disable one-hot encoding for regressors that supports nominal attributes.

  • multiply_hessian_by – The multiply hessian by this parameter to generate weights for multiple iterations.

  • skip_training – Skip training of 1/skip_training instances. skip_training=1 means no skipping is performed (train on all instances).

  • use_squared_loss – Whether to use squared loss for classification.

cli_help()[source]#
predict(instance)[source]#

Predict the label of an instance.

The base implementation calls predict_proba() and returns the label with the highest probability.

Parameters:

instance – The instance to predict the label for.

Returns:

The predicted label or None if the classifier is unable to make a prediction.

predict_proba(instance)[source]#

Return probability estimates for each label.

Parameters:

instance – The instance to estimate the probabilities for.

Returns:

An array of probabilities for each label or None if the classifier is unable to make a prediction.

train(instance)[source]#

Train the classifier with a labeled instance.

Parameters:

instance – The labeled instance to train the classifier with.

random_seed: int#

The random seed for reproducibility.

When implementing a classifier ensure random number generators are seeded.

schema: Schema#

The schema representing the instances.