SOKNLBT#

class capymoa.regressor.SOKNLBT[source]#

Bases: MOARegressor

The base tree for Self-Optimising K Nearest Leaves as distribed by Sun. at el.

SOKNLBT modifies the FIMT-DD algorithm to store information at the leaves that allows SOKNL to calculate the distance between a given instance and a leaf.

See also capymoa.regressor.SOKNL and py:class:capymoa.regressor.FIMTDD See capymoa.base.MOARegressor for train and predict.

Reference:

Sun, Yibin, Bernhard Pfahringer, Heitor Murilo Gomes, and Albert Bifet. “SOKNL: A novel way of integrating K-nearest neighbours with adaptive random forest regression for data streams.” Data Mining and Knowledge Discovery 36, no. 5 (2022): 2006-2032.

Example usage:

>>> from capymoa.datasets import Fried
    >>> from capymoa.regressor import SOKNLBT
    >>> from capymoa.evaluation import prequential_evaluation
>>> stream = Fried()
>>> schema = stream.get_schema()
>>> learner = SOKNLBT(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].rmse()
4.950050301515773

CLI_help()[source]#

__init__( schema: Schema, subspace_size_size: int = 2, split_criterion: SplitCriterion | str = 'VarianceReductionSplitCriterion', grace_period: int = 200, split_confidence: float = 1.0e-7, tie_threshold: float = 0.05, page_hinckley_alpha: float = 0.005, page_hinckley_threshold: int = 50, alternate_tree_fading_factor: float = 0.995, alternate_tree_t_min: int = 150, alternate_tree_time: int = 1500, learning_ratio: float = 0.02, learning_ratio_decay_factor: float = 0.001, learning_ratio_const: bool = False, random_seed: int | None = None, ) → None[source]#

Construct SelfOptimisingBaseTree.

Parameters:

subspace_size_size – Number of features per subset for each node split. Negative values = #features - k
split_criterion – Split criterion to use.
grace_period – Number of instances a leaf should observe between split attempts.
split_confidence – Allowed error in split decision, values close to 0 will take long to decide.
tie_threshold – Threshold below which a split will be forced to break ties.
page_hinckley_alpha – Alpha value to use in the Page Hinckley change detection tests.
page_hinckley_threshold – Threshold value used in the Page Hinckley change detection tests.
alternate_tree_fading_factor – Fading factor used to decide if an alternate tree should replace an original.
alternate_tree_t_min – Tmin value used to decide if an alternate tree should replace an original.
alternate_tree_time – The number of instances used to decide if an alternate tree should be discarded.
learning_ratio – Learning ratio to used for training the Perceptrons in the leaves.
learning_ratio_decay_factor – Learning rate decay factor (not used when learning rate is constant).
learning_ratio_const – Keep learning rate constant instead of decaying.

predict(instance)[source]#

train(instance)[source]#

SOKNLBT#

This Page