SGDClassifier#

class capymoa.classifier.SGDClassifier[source]#

Bases: SKClassifier

Streaming stochastic gradient descent classifier.

This wraps SGDClassifier for ease of use in the streaming context. Some options are missing because they are not relevant in the streaming context. Furthermore, the learning rate is constant.

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import PassiveAggressiveClassifier
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = SGDClassifier(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
84.2
__init__(
schema: Schema,
loss: Literal['hinge', 'log_loss', 'modified_huber', 'squared_hinge', 'perceptron', 'squared_error', 'huber', 'epsilon_insensitive', 'squared_epsilon_insensitive'] = 'hinge',
penalty: Literal['l2', 'l1', 'easticnet'] = 'l2',
alpha: float = 0.0001,
l1_ratio: float = 0.15,
fit_intercept: bool = True,
epsilon: float = 0.1,
n_jobs: int | None = None,
learning_rate: Literal['constant', 'optimal', 'invscaling'] = 'optimal',
eta0: float = 0.0,
random_seed: int | None = None,
)[source]#

Construct stochastic gradient descent classifier.

Parameters:
  • schema – Describes the datastream’s structure.

  • loss – The loss function to be used.

  • penalty – The penalty (aka regularization term) to be used.

  • alpha – Constant that multiplies the regularization term.

  • l1_ratio – The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1. Only used if penalty is ‘elasticnet’. Values must be in the range [0.0, 1.0].

  • fit_intercept – Whether the intercept (bias) should be estimated or not. If False, the data is assumed to be already centered.

  • epsilon – Epsilon in the epsilon-insensitive loss functions; only if loss is ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’. For ‘huber’, determines the threshold at which it becomes less important to get the prediction exactly right. For epsilon-insensitive, any differences between the current prediction and the correct label are ignored if they are less than this threshold.

  • n_jobs – The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. Defaults to 1.

  • learning_rate – The size of the gradient step.

  • eta0 – The initial learning rate for the ‘constant’, ‘invscaling’ or ‘adaptive’ schedules. The default value is 0.0 as eta0 is not used by the default schedule ‘optimal’.

  • class_weight

    Weights associated with classes. If not given, all classes are supposed to have weight one.

    The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

  • random_seed – Seed for reproducibility.

predict(instance: Instance)[source]#

Predict the label of an instance.

The base implementation calls predict_proba() and returns the label with the highest probability.

Parameters:

instance – The instance to predict the label for.

Returns:

The predicted label or None if the classifier is unable to make a prediction.

predict_proba(instance: Instance)[source]#

Return probability estimates for each label.

Parameters:

instance – The instance to estimate the probabilities for.

Returns:

An array of probabilities for each label or None if the classifier is unable to make a prediction.

train(instance: LabeledInstance)[source]#

Train the classifier with a labeled instance.

Parameters:

instance – The labeled instance to train the classifier with.

random_seed: int#

The random seed for reproducibility.

When implementing a classifier ensure random number generators are seeded.

schema: Schema#

The schema representing the instances.

sklearner: SGDClassifier#

The underlying scikit-learn object