StreamingGradientBoostedTrees#

class capymoa.classifier.StreamingGradientBoostedTrees[source]#

Bases: MOAClassifier

Streaming Gradient Boosted Trees (SGBT) Classifier

Streaming Gradient Boosted Trees (SGBT), which is trained using weighted squared loss elicited in XGBoost. SGBT exploits trees with a replacement strategy to detect and recover from drifts, thus enabling the ensemble to adapt without sacrificing the predictive performance.

Reference:

Gradient boosted trees for evolving data streams. Nuwan Gunasekara, Bernhard Pfahringer, Heitor Murilo Gomes, Albert Bifet. Machine Learning, Springer, 2024.

Example usages:

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import StreamingGradientBoostedTrees
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = StreamingGradientBoostedTrees(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
86.3
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = StreamingGradientBoostedTrees(schema, base_learner='meta.AdaptiveRandomForestRegressor -s 10', boosting_iterations=10)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
86.8
__init__(
schema: Schema | None = None,
random_seed: int = 0,
base_learner='trees.FIMTDD -s VarianceReductionSplitCriterion -g 25 -c 0.05 -e -p',
boosting_iterations: int = 100,
percentage_of_features: int = 75,
learning_rate=0.0125,
disable_one_hot: bool = False,
multiply_hessian_by: int = 1,
skip_training: int = 1,
use_squared_loss: bool = False,
)[source]#

Streaming Gradient Boosted Trees (SGBT) Classifier

Parameters:
  • schema – The schema of the stream.

  • random_seed – The random seed passed to the MOA learner.

  • base_learner – The base learner to be trained. Default FIMTDD -s VarianceReductionSplitCriterion -g 25 -c 0.05 -e -p.

  • boosting_iterations – The number of boosting iterations.

  • percentage_of_features – The percentage of features to use.

  • learning_rate – The learning rate.

  • disable_one_hot – Whether to disable one-hot encoding for regressors that supports nominal attributes.

  • multiply_hessian_by – The multiply hessian by this parameter to generate weights for multiple iterations.

  • skip_training – Skip training of 1/skip_training instances. skip_training=1 means no skipping is performed (train on all instances).

  • use_squared_loss – Whether to use squared loss for classification.

CLI_help()[source]#
predict(instance)[source]#
predict_proba(instance)[source]#
train(instance)[source]#