2. Using sklearn with CapyMOA#
In this tutorial we demonstrate how someone can directly use scikit-learn learners in CapyMOA. * The primary requirement for a scikit-learn learner to be used is that it implements partial_fit()
More information about CapyMOA can be found in https://www.capymoa.org
last update on 03/05/2024
1. Using raw sklearn objects#
This example shows a model from scikit-learn can be used with our
Instance
representation in a simple test-then-train loopIn this case, we need to adapt data to accommodate what the sklearn expects
[1]:
from capymoa.evaluation import ClassificationEvaluator
from capymoa.datasets import ElectricityTiny
from sklearn import linear_model
# Toy dataset with only 1000 instances
elec_stream = ElectricityTiny()
# Creates a sklearn classifier
sklearn_SGD = linear_model.SGDClassifier()
ob_evaluator = ClassificationEvaluator(schema=elec_stream.get_schema())
# Counter for partial fits
partial_fit_count = 0
while elec_stream.has_more_instances():
instance = elec_stream.next_instance()
prediction = -1
if partial_fit_count > 0: # scikit-learn does not allows invoking predict in a model that was not fit before
prediction = sklearn_SGD.predict([instance.x])[0]
ob_evaluator.update(instance.y_index, prediction)
sklearn_SGD.partial_fit([instance.x], [instance.y_index], classes=elec_stream.schema.get_label_indexes())
partial_fit_count += 1
ob_evaluator.accuracy()
[1]:
84.7
2. Using a generic SKClassifier wrapper#
Instead of sklearn
SGDClassifier
here we use CapyMOA wrapperSKClassifier
on a test-then-train loopThere is also a
SKRegressor
available in CapyMOA
[2]:
from sklearn import linear_model
from capymoa.base import SKClassifier
from capymoa.evaluation import ClassificationEvaluator
## Opening a file as a stream
elec_stream = ElectricityTiny()
# Creating a learner
sklearn_SGD = SKClassifier(schema=elec_stream.get_schema(), sklearner=linear_model.SGDClassifier())
# Creating the evaluator
sklearn_SGD_evaluator = ClassificationEvaluator(schema=elec_stream.get_schema())
while elec_stream.has_more_instances():
instance = elec_stream.next_instance()
prediction = sklearn_SGD.predict(instance)
sklearn_SGD_evaluator.update(instance.y_index, prediction)
sklearn_SGD.train(instance)
sklearn_SGD_evaluator.accuracy()
[2]:
84.7
3. Using prequential evaluation and SKClassifier#
Instead of an instance loop we may use the
prequential_evaluation()
function
[3]:
from capymoa.evaluation import prequential_evaluation
elec_stream = ElectricityTiny()
sklearn_SGD = SKClassifier(schema=elec_stream.get_schema(), sklearner=linear_model.SGDClassifier())
results_sklearn_SGD = prequential_evaluation(stream=elec_stream, learner=sklearn_SGD, window_size=4500)
results_sklearn_SGD.cumulative.accuracy()
[3]:
84.7
4. Further abstractions#
We can wrap popular algorithms to make then even easier to use
So far, one can use the following wrappers:
PassiveAggressiveClassifier
SGDClassifier
PassiveAggressiveRegressor
SGDRegressor
In the following example we show how one can use
SGDClassifier
andPassiveAggressiveClassifier
Observation: this code take up to 3 minutes if ``max_instances`` is not set as it will process all 100k instances from ``RTG_2abrupt`` using SGD and PA. We set the ``max_instances`` parameter to use less instances to process it quicker in the example.
[4]:
%%time
from capymoa.classifier import SGDClassifier, PassiveAggressiveClassifier
from capymoa.evaluation import prequential_evaluation_multiple_learners
from capymoa.evaluation.visualization import plot_windowed_results
from capymoa.datasets import RTG_2abrupt
RTG_2abrupt_stream = RTG_2abrupt()
sklearn_SGD = SGDClassifier(schema=RTG_2abrupt_stream.get_schema())
sklearn_PA = PassiveAggressiveClassifier(schema=RTG_2abrupt_stream.get_schema())
results = prequential_evaluation_multiple_learners(stream=RTG_2abrupt_stream,
learners={'SGD': sklearn_SGD, 'PA': sklearn_PA},
max_instances=10000,
window_size=1000)
plot_windowed_results(results['SGD'], results['PA'], metric='accuracy')
CPU times: user 16.8 s, sys: 166 ms, total: 17 s
Wall time: 16.8 s
[ ]: