HoeffdingTree#
- class capymoa.classifier.HoeffdingTree[source]#
Bases:
MOAClassifier
Hoeffding Tree classifier.
Parameters#
- schema
The schema of the stream
- random_seed
The random seed passed to the moa learner
- grace_period
Number of instances a leaf should observe between split attempts.
- split_criterion
Split criterion to use. Defaults to InfoGainSplitCriterion
- confidence
Significance level to calculate the Hoeffding bound. The significance level is given by 1 - delta. Values closer to zero imply longer split decision delays.
- tie_threshold
Threshold below which a split will be forced to break ties.
- leaf_prediction
Prediction mechanism used at leafs.</br> - 0 - Majority Class</br> - 1 - Naive Bayes</br> - 2 - Naive Bayes Adaptive</br>
- nb_threshold
Number of instances a leaf should observe before allowing Naive Bayes.
- numeric_attribute_observer
The Splitter or Attribute Observer (AO) used to monitor the class statistics of numeric features and perform splits.
- binary_split
If True, only allow binary splits.
- max_byte_size
The max size of the tree, in bytes.
- memory_estimate_period
Interval (number of processed instances) between memory consumption checks.
- stop_mem_management
If True, stop growing as soon as memory limit is hit.
- remove_poor_attrs
If True, disable poor attributes to reduce memory usage.
- disable_prepruning
If True, disable merit-based tree pre-pruning.
- __init__(
- schema: Schema | None = None,
- random_seed: int = 0,
- grace_period: int = 200,
- split_criterion: str | SplitCriterion = 'InfoGainSplitCriterion',
- confidence: float = 0.001,
- tie_threshold: float = 0.05,
- leaf_prediction: int = 'NaiveBayesAdaptive',
- nb_threshold: int = 0,
- numeric_attribute_observer: str = 'GaussianNumericAttributeClassObserver',
- binary_split: bool = False,
- max_byte_size: float = 33554433,
- memory_estimate_period: int = 1000000,
- stop_mem_management: bool = True,
- remove_poor_attrs: bool = False,
- disable_prepruning: bool = True,
- predict(instance)[source]#
Predict the label of an instance.
The base implementation calls
predict_proba()
and returns the label with the highest probability.- Parameters:
instance – The instance to predict the label for.
- Returns:
The predicted label or
None
if the classifier is unable to make a prediction.
- predict_proba(instance)[source]#
Return probability estimates for each label.
- Parameters:
instance – The instance to estimate the probabilities for.
- Returns:
An array of probabilities for each label or
None
if the classifier is unable to make a prediction.
- train(instance)[source]#
Train the classifier with a labeled instance.
- Parameters:
instance – The labeled instance to train the classifier with.
- random_seed: int#
The random seed for reproducibility.
When implementing a classifier ensure random number generators are seeded.