SOKNLBT#

class capymoa.regressor.SOKNLBT[source]#

Bases: MOARegressor

The base tree for Self-Optimising K Nearest Leaves as distribed by Sun. at el.

SOKNLBT modifies the FIMT-DD algorithm to store information at the leaves that allows SOKNL to calculate the distance between a given instance and a leaf.

See also capymoa.regressor.SOKNL and py:class:capymoa.regressor.FIMTDD See capymoa.base.MOARegressor for train and predict.

Reference:

Sun, Yibin, Bernhard Pfahringer, Heitor Murilo Gomes, and Albert Bifet. “SOKNL: A novel way of integrating K-nearest neighbours with adaptive random forest regression for data streams.” Data Mining and Knowledge Discovery 36, no. 5 (2022): 2006-2032.

Example usage:

>>> from capymoa.datasets import Fried
    >>> from capymoa.regressor import SOKNLBT
    >>> from capymoa.evaluation import prequential_evaluation
>>> stream = Fried()
>>> schema = stream.get_schema()
>>> learner = SOKNLBT(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].rmse()
4.950050301515773
__init__(
schema: Schema,
subspace_size_size: int = 2,
split_criterion: SplitCriterion | str = 'VarianceReductionSplitCriterion',
grace_period: int = 200,
split_confidence: float = 1e-07,
tie_threshold: float = 0.05,
page_hinckley_alpha: float = 0.005,
page_hinckley_threshold: int = 50,
alternate_tree_fading_factor: float = 0.995,
alternate_tree_t_min: int = 150,
alternate_tree_time: int = 1500,
learning_ratio: float = 0.02,
learning_ratio_decay_factor: float = 0.001,
learning_ratio_const: bool = False,
random_seed: int | None = None,
) None[source]#

Construct SelfOptimisingBaseTree.

Parameters:
  • subspace_size_size – Number of features per subset for each node split. Negative values = #features - k

  • split_criterion – Split criterion to use.

  • grace_period – Number of instances a leaf should observe between split attempts.

  • split_confidence – Allowed error in split decision, values close to 0 will take long to decide.

  • tie_threshold – Threshold below which a split will be forced to break ties.

  • page_hinckley_alpha – Alpha value to use in the Page Hinckley change detection tests.

  • page_hinckley_threshold – Threshold value used in the Page Hinckley change detection tests.

  • alternate_tree_fading_factor – Fading factor used to decide if an alternate tree should replace an original.

  • alternate_tree_t_min – Tmin value used to decide if an alternate tree should replace an original.

  • alternate_tree_time – The number of instances used to decide if an alternate tree should be discarded.

  • learning_ratio – Learning ratio to used for training the Perceptrons in the leaves.

  • learning_ratio_decay_factor – Learning rate decay factor (not used when learning rate is constant).

  • learning_ratio_const – Keep learning rate constant instead of decaying.

CLI_help()[source]#
predict(instance)[source]#
train(instance)[source]#