8. Prediction Intervals for data streams#

How to utilise the prediction interval on regression tasks in CapyMOA
Two methods for obtaining prediction intervals are currently available in CapyMOA: MVE and AdaPI

More details about prediction intervals for streaming data can be found in the AdaPI paper:

Yibin Sun, Bernhard Pfahringer, Heitor Murilo Gomes & Albert Bifet. “Adaptive Prediction Interval for Data Stream Regression.” Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 2024.

More information about CapyMOA can be found in https://www.capymoa.org

notebook last updated on 01/08/2024

[1]:

from capymoa.datasets import Fried

# load data
fried_stream = Fried()

1. Basic prediction interval learner build-up#

An example of the use case of prediction interval in CapyMOA
Current available prediction interval learners require a regressive base model to work

[2]:

from capymoa.regressor import SOKNL
from capymoa.prediction_interval import MVE

# build prediction interval learner in regular manner
soknl = SOKNL(schema=fried_stream.get_schema(), ensemble_size=10)
mve = MVE(schema=fried_stream.get_schema(), base_learner=soknl)

# build prediction interval learner in in-line manner
mve_inline = MVE(
    schema=fried_stream.get_schema(),
    base_learner=SOKNL(schema=fried_stream.get_schema(), ensemble_size=10),
)

2. Creating evaluators#

We involve two types of prediction interval evaluators so far: basic (cumulative) and windowed

[3]:

from capymoa.evaluation.evaluation import (
    PredictionIntervalEvaluator,
    PredictionIntervalWindowedEvaluator,
)

# build prediction interval (basic and windowed) evaluators
mve_evaluator = PredictionIntervalEvaluator(schema=fried_stream.get_schema())
mve_windowed_evaluator = PredictionIntervalWindowedEvaluator(
    schema=fried_stream.get_schema(), window_size=1000
)

3. Running test-then-train/prequential tasks manually#

don’t forget to train the models (call .train() function) at the end!

[4]:

# run test-then-train/prequential tasks
while fried_stream.has_more_instances():
    instance = fried_stream.next_instance()
    prediction = mve.predict(instance)
    mve_evaluator.update(instance.y_value, prediction)
    mve_windowed_evaluator.update(instance.y_value, prediction)
    mve.train(instance)

4. Results from both evaluators#

[5]:

# show results
print(
    f"MVE basic evaluation:\ncoverage: {mve_evaluator.coverage()}, NMPIW: {mve_evaluator.nmpiw()}"
)
print(
    f"MVE windowed evaluation in last window:\ncoverage: {mve_windowed_evaluator.coverage()}, \nNMPIW: {mve_windowed_evaluator.nmpiw()}"
)

MVE basic evaluation:
coverage: 97.28, NMPIW: 30.66
MVE windowed evaluation in last window:
coverage: [98.9, 98.1, 96.7, 97.7, 97.4, 97.7, 98.0, 97.8, 97.2, 96.8, 97.2, 97.9, 97.1, 97.5, 98.2, 97.8, 96.4, 97.4, 96.6, 97.3, 98.7, 96.4, 97.2, 97.0, 97.4, 96.0, 97.5, 97.1, 96.8, 96.7, 97.7, 97.2, 98.0, 96.1, 97.5, 97.2, 96.9, 97.2, 96.9, 96.7],
NMPIW: [61.91, 46.44, 42.59, 43.82, 40.64, 40.08, 35.84, 38.32, 35.65, 39.25, 39.69, 36.09, 34.97, 34.95, 35.3, 34.79, 34.28, 33.46, 35.69, 35.64, 35.75, 35.03, 31.01, 33.91, 32.83, 33.54, 31.92, 31.31, 33.83, 32.07, 29.51, 33.25, 33.53, 31.93, 33.22, 29.3, 30.48, 29.39, 33.0, 32.14]

5. Wrap things up with prequential evaluation#

Prediction interval tasks also can be wrapped up with prequential evaluation in CapyMOA

[6]:

from capymoa.evaluation import prequential_evaluation
from capymoa.prediction_interval import AdaPI

# restart stream
fried_stream.restart()
# specify regressive model
regressive_learner = SOKNL(schema=fried_stream.get_schema(), ensemble_size=10)
# build prediction interval models
mve_learner = MVE(schema=fried_stream.get_schema(), base_learner=regressive_learner)
adapi_learner = AdaPI(
    schema=fried_stream.get_schema(), base_learner=regressive_learner, limit=0.001
)
# gather results
mve_results = prequential_evaluation(
    stream=fried_stream, learner=mve_learner, window_size=1000
)
adapi_results = prequential_evaluation(
    stream=fried_stream, learner=adapi_learner, window_size=1000
)

# show overall results
print(
    f"MVE coverage: {mve_results.cumulative.coverage()}, NMPIW: {mve_results.cumulative.nmpiw()}"
)
print(
    f"AdaPI coverage: {adapi_results.cumulative.coverage()}, NMPIW: {adapi_results.cumulative.nmpiw()}"
)

MVE coverage: 97.28, NMPIW: 30.66
AdaPI coverage: 96.15, NMPIW: 28.53

6. Plots are also supported#

[7]:

from capymoa.evaluation.visualization import plot_windowed_results

# plot over time comparison
plot_windowed_results(mve_results, adapi_results, metric="coverage")
plot_windowed_results(mve_results, adapi_results, metric="nmpiw")

../_images/notebooks_08_prediction_interval_13_0.png

../_images/notebooks_08_prediction_interval_13_1.png

Plotting prediction intervals over time#

We also provide a visualization tool for plotting prediction intervals over time
The function plot_prediction_interval can be used to plot the prediction intervals over time
The function can take one of two prediction interval results as input

Single result plotting example#

In order to plot the prediction interval over time, we need to have store the predictions and the ground truth values in the prediction interval results
The shaded area represents the prediction interval, while the solid line represents the regressive predictions.
The stared-markers represent the ground truth values that are covered by the intervals.
On the other hand, the cross-markers represent the ground truth values that are outside the intervals.
The colors can be adjusted by the colors parameter in the function as a list.
start and end parameters can be used to specify the range of the plot.
The ground truth and predictions can be omitted by setting the plot_truth and plot_predictions parameters to False.

We have to set ``optimise`` to ``Flase`` to avoid subscribing problems

[8]:

new_mve_learner = MVE(
    schema=fried_stream.get_schema(),
    base_learner=SOKNL(schema=fried_stream.get_schema(), ensemble_size=10),
)
new_mve_results = prequential_evaluation(
    stream=fried_stream,
    learner=new_mve_learner,
    window_size=1000,
    optimise=False,
    store_predictions=True,
    store_y=True,
)

[9]:

from capymoa.evaluation.visualization import plot_prediction_interval

plot_prediction_interval(new_mve_results, start=300, end=500, colors=["coral"])

../_images/notebooks_08_prediction_interval_17_0.png

Two results plotting example#

For comparison purposes, we can also plot two prediction interval results over time.
We don’t take more results since it makes the plot too messy to read.

[10]:

# Let's add another results
new_adapi_learner = AdaPI(
    schema=fried_stream.get_schema(),
    base_learner=SOKNL(schema=fried_stream.get_schema(), ensemble_size=10),
    limit=0.001,
)
new_adapi_results = prequential_evaluation(
    stream=fried_stream,
    learner=new_adapi_learner,
    window_size=1000,
    optimise=False,
    store_predictions=True,
    store_y=True,
)

[11]:

plot_prediction_interval(
    new_mve_results,
    new_adapi_results,
    start=300,
    end=500,
    colors=["coral", "teal"],
    plot_predictions=False,
)

../_images/notebooks_08_prediction_interval_20_0.png

New “plus” markers represent the ground truth values that are covered by the narrower but not by wider intervals.
The function automatically puts the wider area to the back to make the narrower intervals more visible.