SuccessiveHalvingClassifier#
- class capymoa.automl.SuccessiveHalvingClassifier[source]#
Bases:
ClassifierSuccessive Halving Classifier for model selection in streaming scenarios.
This method applies the principle of Successive Halving [1] to incrementally allocate more computational budget to the most promising models while discarding poorly performing ones. It progressively narrows the pool of candidates as more data is observed, improving efficiency in streaming model selection.
>>> from capymoa.datasets import ElectricityTiny >>> from capymoa.classifier import HoeffdingTree >>> from capymoa.automl import SuccessiveHalvingClassifier >>> stream = ElectricityTiny() >>> schema = stream.get_schema() >>> learner = SuccessiveHalvingClassifier( ... schema=schema, ... base_classifiers=[HoeffdingTree], ... eta=2.0, ... evaluation_metric="accuracy" ... ) >>> instance = next(stream) >>> learner.train(instance)
See also
- __init__(
- schema: Schema | None = None,
- random_seed: int = 1,
- base_classifiers: list[type[Classifier]] | None = None,
- config_file: str | None = None,
- max_instances: int | None = None,
- budget: int | None = None,
- eta: float = 2.0,
- min_models: int = 1,
- evaluation_metric: str = 'accuracy',
- verbose: bool = False,
Construct a Successive Halving Classifier.
- Parameters:
schema – The schema of the stream.
random_seed – Random seed used for reproducibility.
base_classifiers – List of base classifier classes to consider.
config_file – Path to a JSON configuration file with model hyperparameters.
max_instances – Maximum number of instances to process per model. If provided, the total budget is computed as
2 * n_models * max_instances / eta.budget – Total training budget (number of instances across all models). Ignored if
max_instancesis provided.eta – Reduction factor controlling how many models advance to the next round.
min_models – Minimum number of models to maintain in successive rounds.
evaluation_metric – Metric used to evaluate models. Defaults to
"accuracy".verbose – Whether to print progress information during training.
- get_model_info() Dict[str, Any][source]#
Get information about the current state of the classifier.
- Returns:
Dictionary containing classifier information
- predict(instance)[source]#
Predict the label of an instance.
The base implementation calls
predict_proba()and returns the label with the highest probability.- Parameters:
instance – The instance to predict the label for.
- Returns:
The predicted label or
Noneif the classifier is unable to make a prediction.
- predict_proba(instance)[source]#
Return probability estimates for each label.
- Parameters:
instance – The instance to estimate the probabilities for.
- Returns:
An array of probabilities for each label or
Noneif the classifier is unable to make a prediction.
- train(instance)[source]#
Train the classifier with a labeled instance.
- Parameters:
instance – The labeled instance to train the classifier with.
- property best_model#
Return the current best model.
- random_seed: int#
The random seed for reproducibility.
When implementing a classifier ensure random number generators are seeded.