SGDClassifier#

class capymoa.classifier.SGDClassifier[source]#

Bases: SKClassifier

Streaming stochastic gradient descent classifier.

This wraps sklearn.linear_model.SGDClassifier for ease of use in the streaming context. Some options are missing because they are not relevant in the streaming context. Furthermore, the learning rate is constant.

>>> from capymoa.datasets import ElectricityTiny
>>> from capymoa.classifier import PassiveAggressiveClassifier
>>> from capymoa.evaluation import prequential_evaluation
>>> stream = ElectricityTiny()
>>> schema = stream.get_schema()
>>> learner = SGDClassifier(schema)
>>> results = prequential_evaluation(stream, learner, max_instances=1000)
>>> results["cumulative"].accuracy()
84.2
sklearner: SGDClassifier#

The underlying scikit-learn object

__init__(
schema: Schema,
loss: Literal['hinge', 'log_loss', 'modified_huber', 'squared_hinge', 'perceptron', 'squared_error', 'huber', 'epsilon_insensitive', 'squared_epsilon_insensitive'] = 'hinge',
penalty: Literal['l2', 'l1', 'easticnet'] = 'l2',
alpha: float = 0.0001,
l1_ratio: float = 0.15,
fit_intercept: bool = True,
epsilon: float = 0.1,
n_jobs: int | None = None,
learning_rate: Literal['constant', 'optimal', 'invscaling'] = 'optimal',
eta0: float = 0.0,
random_seed: int | None = None,
)[source]#

Construct stochastic gradient descent classifier.

Parameters:
  • schema – Describes the datastream’s structure.

  • loss – The loss function to be used.

  • penalty – The penalty (aka regularization term) to be used.

  • alpha – Constant that multiplies the regularization term.

  • l1_ratio – The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1. Only used if penalty is ‘elasticnet’. Values must be in the range [0.0, 1.0].

  • fit_intercept – Whether the intercept (bias) should be estimated or not. If False, the data is assumed to be already centered.

  • epsilon – Epsilon in the epsilon-insensitive loss functions; only if loss is ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’. For ‘huber’, determines the threshold at which it becomes less important to get the prediction exactly right. For epsilon-insensitive, any differences between the current prediction and the correct label are ignored if they are less than this threshold.

  • n_jobs – The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. Defaults to 1.

  • learning_rate – The size of the gradient step.

  • eta0 – The initial learning rate for the ‘constant’, ‘invscaling’ or ‘adaptive’ schedules. The default value is 0.0 as eta0 is not used by the default schedule ‘optimal’.

  • class_weight

    Weights associated with classes. If not given, all classes are supposed to have weight one.

    The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

  • random_seed – Seed for reproducibility.

predict(instance: Instance)[source]#
predict_proba(instance: Instance)[source]#
train(instance: LabeledInstance)[source]#