Parallel Ensembles#

This notebook is aimed at showing how to use the parallel and mini-batch variants of the ensembles.

  • Examples of all the parallel ensembles available.

  • Examples on how to use the parallel versions.

  • Example of the trade-off between runtime and accuracy when using the parallel ensembles.


More information about CapyMOA can be found in https://www.capymoa.org

last update on 30/05/2024

1. Using the most basic bagging ensemble#

  • CapyMOA chooses between the standard and the mini-batch version behind the curtains based on the parameter configuration used

  • For more information about the parallel ensembles please refer to the following references:

G. Cassales, H. M. Gomes, A. Bifet, B. Pfahringer and H. Senger, Improving the performance of bagging ensembles for data streams through mini-batching, Information Sciences,Volume 580, 2021, Pages 260-282, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2021.08.085.

G. Cassales, H. M. Gomes, A. Bifet, B. Pfahringer and H. Senger, "Balancing Performance and Energy Consumption of Bagging Ensembles for the Classification of Data Streams in Edge Computing," in IEEE Transactions on Network and Service Management, vol. 20, no. 3, pp. 3038-3054, Sept. 2023, doi: 10.1109/TNSM.2022.3226505.

[1]:
from capymoa.datasets import Covtype
from capymoa.classifier import OnlineBagging
from capymoa.evaluation import prequential_evaluation
from capymoa.evaluation.visualization import plot_windowed_results

cover = Covtype()

ozabag_sequential = OnlineBagging(schema=cover.schema, ensemble_size=10)
ozabag_mb_parallel = OnlineBagging(schema=cover.schema, ensemble_size=10, minibatch_size=25, number_of_jobs=5)

1.1 Understanding the parameters#

  • Since we used different parameters in the constructor call, each instace references a different object

    • sequential: note the CLI help only mention baseLearner and ensembleSize

    • parallel: note the extra parameters numCores and batchSize

  • Setting the parallel parameters on the constructor call will make it a minibatch parallel ensemble.

[2]:
print(ozabag_sequential.CLI_help())
-l baseLearner (default: trees.HoeffdingTree)
Classifier to train.
-s ensembleSize (default: 10)
The number of models in the bag.

[3]:
print(ozabag_mb_parallel.CLI_help())
-l baseLearner (default: trees.HoeffdingTree)
Classifier to train.
-s ensembleSize (default: 10)
The number of models in the bag.
-c numCores (default: 1)
The amount of CPU Cores used for multi-threading
-b batchSize (default: 1)
The amount of instances the classifier should buffer before training.

1.2 Example comparing sequential and parallel#

[4]:
result_seq = prequential_evaluation(stream=cover, learner=ozabag_sequential, window_size=5000, max_instances=100000)
result_par = prequential_evaluation(stream=cover, learner=ozabag_mb_parallel, window_size=5000, max_instances=100000)

result_seq['learner'] = 'OnlineBagging(Seq)'
result_par['learner'] = 'OnlineBagging(Par)'
  • Note that the minibatch approach creates a divergence in the results for two reasons:

    1. it uses local random generators instead of a global one;

    2. the mini-batch defers the model update by a few instances which causes differences in the predictions;

  • On the bright side, it decreases runtime

[5]:
# decoupling the running and plotting to allow more flexibility
print(f"Cumulative accuracy = {result_seq['cumulative'].accuracy()}")
print(f"wallclock = {result_seq['wallclock']} seconds\n")
print(f"Cumulative accuracy = {result_par['cumulative'].accuracy()}")
print(f"wallclock = {result_par['wallclock']} seconds\n")
plot_windowed_results(result_seq, result_par, ylabel="accuracy")
Cumulative accuracy = 84.59299999999999
wallclock = 9.430704116821289 seconds

Cumulative accuracy = 83.722
wallclock = 5.712066888809204 seconds

../_images/notebooks_parallel_ensembles_9_1.png

1.3 Increasing the ensemble size#

  • Ensembles with more learners will get a bigger decrease in processing time.

  • Let’s see the difference when using 100 classifiers

[6]:
ozabag_sequential = OnlineBagging(schema=cover.schema, ensemble_size=100)
ozabag_mb_parallel = OnlineBagging(schema=cover.schema, ensemble_size=100, minibatch_size=25, number_of_jobs=5)
result_seq100 = prequential_evaluation(stream=cover, learner=ozabag_sequential, window_size=5000, max_instances=100000)
result_par100 = prequential_evaluation(stream=cover, learner=ozabag_mb_parallel, window_size=5000, max_instances=100000)

result_seq100['learner'] = 'OnlineBagging100(Seq)'
result_par100['learner'] = 'OnlineBagging100(Par)'
[7]:
# decoupling the running and plotting to allow more flexibility
print(f"Cumulative accuracy = {result_seq100['cumulative'].accuracy()}")
print(f"wallclock = {result_seq100['wallclock']} seconds\n")
print(f"Cumulative accuracy = {result_par100['cumulative'].accuracy()}")
print(f"wallclock = {result_par100['wallclock']} seconds\n")
plot_windowed_results(result_seq100, result_par100, ylabel="accuracy")
Cumulative accuracy = 84.99600000000001
wallclock = 89.48850011825562 seconds

Cumulative accuracy = 84.20299999999999
wallclock = 46.829185009002686 seconds

../_images/notebooks_parallel_ensembles_12_1.png

2. Other variations#

  • Sequential execution with the minibatch

  • Parallel execution of the incremental approach

[8]:
mbSeq = OnlineBagging(schema=cover.get_schema(), ensemble_size=10, minibatch_size=25)
incPar = OnlineBagging(schema=cover.get_schema(), ensemble_size=10, minibatch_size=1, number_of_jobs=5)
result_mbSeq = prequential_evaluation(stream=cover, learner=mbSeq, window_size=5000, max_instances=100000)
result_incPar = prequential_evaluation(stream=cover, learner=incPar, window_size=5000, max_instances=100000)

result_mbSeq['learner'] = 'OnlineBagging(mb_seq)'
result_incPar['learner'] = 'OnlineBagging(inc_par)'
  • Incremental Sequential differs from Incremental Parallel because of the random sequences;

  • When compared among themselves, mini-batch parallel and mini-batch sequential (single-core) have the same accuracy, as their random sequences are initialized in the same way;

  • Incremental Parallel has the same random sequences as the Mini-batch versions, its improvement comes from making all the predictions with the most up-to-date model;

[9]:
# decoupling the running and plotting to allow more flexibility
print("Incremental Sequential ")
print(f"Cumulative accuracy = {result_seq['cumulative'].accuracy()}")
print(f"wallclock = {result_seq['wallclock']} seconds\n")
print("Mini-batch Parallel")
print(f"Cumulative accuracy = {result_par['cumulative'].accuracy()}")
print(f"wallclock = {result_par['wallclock']} seconds\n")
print("Mini-batch Sequential")
print(f"Cumulative accuracy = {result_mbSeq['cumulative'].accuracy()}")
print(f"wallclock = {result_mbSeq['wallclock']} seconds\n")
print("Incremental Parallel ")
print(f"Cumulative accuracy = {result_incPar['cumulative'].accuracy()}")
print(f"wallclock = {result_incPar['wallclock']} seconds\n")
plot_windowed_results(result_seq, result_par, result_mbSeq, result_incPar, ylabel="accuracy")
Incremental Sequential
Cumulative accuracy = 84.59299999999999
wallclock = 9.430704116821289 seconds

Mini-batch Parallel
Cumulative accuracy = 83.722
wallclock = 5.712066888809204 seconds

Mini-batch Sequential
Cumulative accuracy = 83.722
wallclock = 8.536521911621094 seconds

Incremental Parallel
Cumulative accuracy = 84.391
wallclock = 9.514532089233398 seconds

../_images/notebooks_parallel_ensembles_16_1.png

3. More ensembles#

There are more mini-batch ensembles with parallelism implemented

  • AdaptiveRandomForest

  • LeveragingBagging

  • OzaBagAdwin

[10]:
from capymoa.classifier import OnlineAdwinBagging, LeveragingBagging, AdaptiveRandomForestClassifier

ob_adwin = OnlineAdwinBagging(schema=cover.get_schema(), ensemble_size=30)
lb = LeveragingBagging(schema=cover.get_schema(), ensemble_size=30)
arf = AdaptiveRandomForestClassifier(schema=cover.get_schema(), ensemble_size=30)

results_ob_adwin = prequential_evaluation(stream=cover, learner=ob_adwin, window_size=5000, max_instances=100000)
results_lb = prequential_evaluation(stream=cover, learner=lb, window_size=5000, max_instances=100000)
results_arf = prequential_evaluation(stream=cover, learner=arf, window_size=5000, max_instances=100000)
[11]:

print(f"Accuracy {results_ob_adwin['learner']}: {results_ob_adwin['cumulative'].accuracy()}") print(f"Accuracy {results_lb['learner']}: {results_lb['cumulative'].accuracy()}") print(f"Accuracy {results_arf['learner']}: {results_arf['cumulative'].accuracy()}") plot_windowed_results(results_ob_adwin, results_lb, results_arf, ylabel="accuracy")
Accuracy OnlineBagging with ADWIN: 87.63300000000001
Accuracy Leveraging OnlineBagging: 92.55499999999999
Accuracy AdaptiveRandomForest: 93.822
../_images/notebooks_parallel_ensembles_19_1.png

4. Mini-batch size impact#

  • The greater the mini-batch size the greater the impact on performance (both good and bad)

  • Although this behavior can fluctuate during the stream, at the end it is expected that larger mini-batches have slightly smaller runtime and slightly worse accuracy

    • It becomes slightly faster due to the reuse of data structures in the higher memory hierarchy

    • Predictions are slightly worse due to the larger number of instances predicted with the “outdated” model

      • Especially in the case of concept drifts, which suffer from the delay in detection and reset of the classifier

[12]:
ob_adwin_mb50 = OnlineAdwinBagging(schema=cover.get_schema(), ensemble_size=30, minibatch_size=50, number_of_jobs=5)
ob_adwin_mb100 = OnlineAdwinBagging(schema=cover.get_schema(), ensemble_size=30, minibatch_size=100, number_of_jobs=5)
ob_adwin_mb500 = OnlineAdwinBagging(schema=cover.get_schema(), ensemble_size=30, minibatch_size=500, number_of_jobs=5)

results_ob_adwin_mb50 = prequential_evaluation(stream=cover, learner=ob_adwin_mb50, window_size=5000, max_instances=100000)
results_ob_adwin_mb100 = prequential_evaluation(stream=cover, learner=ob_adwin_mb100, window_size=5000, max_instances=100000)
results_ob_adwin_mb500 = prequential_evaluation(stream=cover, learner=ob_adwin_mb500, window_size=5000, max_instances=100000)

results_ob_adwin_mb50['experiment_id'] = "Mini-batch size 50"
results_ob_adwin_mb100['experiment_id'] = "Mini-batch size 100"
results_ob_adwin_mb500['experiment_id'] = "Mini-batch size 500"
[13]:
print(f"Accuracy {results_ob_adwin_mb50['experiment_id']}: {results_ob_adwin_mb50['cumulative'].accuracy()}")
print(f"wallclock = {results_ob_adwin_mb50['wallclock']} seconds\n")
print(f"Accuracy {results_ob_adwin_mb100['experiment_id']}: {results_ob_adwin_mb100['cumulative'].accuracy()}")
print(f"wallclock = {results_ob_adwin_mb100['wallclock']} seconds\n")
print(f"Accuracy {results_ob_adwin_mb500['experiment_id']}: {results_ob_adwin_mb500['cumulative'].accuracy()}")
print(f"wallclock = {results_ob_adwin_mb500['wallclock']} seconds\n")

plot_windowed_results(results_ob_adwin_mb50, results_ob_adwin_mb100, results_ob_adwin_mb500, ylabel="accuracy")
Accuracy Mini-batch size 50: 85.038
wallclock = 14.217211961746216 seconds

Accuracy Mini-batch size 100: 85.955
wallclock = 13.562835931777954 seconds

Accuracy Mini-batch size 500: 84.969
wallclock = 14.451393127441406 seconds

../_images/notebooks_parallel_ensembles_22_1.png