OCLMetrics#

class capymoa.ocl.evaluation.OCLMetrics[source]#

Bases: object

A collection of metrics evaluating an online continual learner.

We define some metrics in terms of a matrix \(R\in\mathbb{R}^{T \times T}\) (accuracy_matrix) where each element \(R_{i,j}\) contains the the test accuracy on task \(j\) after sequentially training on tasks \(1\) through \(i\).

Online learning make predictions continuously during training, so we also provide “anytime” versions of the metrics. These metrics are collected periodically during training. Specifically, \(H\) times per task. The results of this evaluation are stored in a matrix \(A\in\mathbb{R}^{T \times H \times T}\) (anytime_accuracy_matrix) where each element \(A_{i,h,j}\) contains the test accuracy on task \(j\) after sequentially training on tasks \(1\) through \(i-1\) and step \(h\) of task \(i\).

__init__(
anytime_accuracy_all: ndarray,
anytime_accuracy_all_avg: float,
anytime_accuracy_seen: ndarray,
anytime_accuracy_seen_avg: float,
anytime_task_index: ndarray,
accuracy_all: ndarray,
accuracy_all_avg: float,
accuracy_seen: ndarray,
accuracy_seen_avg: float,
accuracy_final: float,
task_index: ndarray,
forward_transfer: float,
backward_transfer: float,
accuracy_matrix: ndarray,
class_cm: ndarray,
anytime_accuracy_matrix: ndarray,
n_classes: int,
n_tasks: int,
n_continual_evaluations: int,
ttt: PrequentialResults,
boundaries: ndarray,
ttt_windowed_task_index: ndarray,
) None[source]#
accuracy_all: ndarray#

The accuracy on all tasks after training on each task.

Is a ndarray of shape (n_tasks,), dtype=np.float32

\[a_\text{all}(t) = \frac{1}{T} \sum_{i=1}^{T} R_{t,i}\]

Use task_index to get the corresponding task index for plotting.

accuracy_all_avg: float#

The average of accuracy_all over all tasks.

\[\bar{a}_\text{all} = \frac{1}{T}\sum_{t=1}^T a_\text{all}(t)\]
accuracy_final: float#

The accuracy on all tasks after training on the final task.

\[a_\text{final} = a_\text{all}(T)\]
accuracy_matrix: ndarray#

A matrix measuring the accuracy on each task after training on each task.

Is a ndarray of shape (n_tasks, n_tasks), dtype=np.float32.

R[i, j] is the accuracy on task \(j\) after training on tasks \(1\) through \(i\).

accuracy_seen: ndarray#

The accuracy on seen tasks after training on each task.

Is a ndarray of shape (n_tasks,), dtype=np.float32.

\[a_\text{seen}(t) = \frac{1}{t}\sum^t_{i=1} R_{t,i}\]

Use task_index to get the corresponding task index for plotting.

accuracy_seen_avg: float#

The average of accuracy_seen over all tasks.

\[\bar{a}_\text{seen} = \frac{1}{T}\sum_{t=1}^T a_\text{seen}(t)\]
anytime_accuracy_all: ndarray#

The accuracy on all tasks after training on each step in each task.

Is a ndarray of shape (n_tasks * n_continual_evaluations,), dtype=np.float32.

\[a_\text{any all}(t, h) = \frac{1}{T}\sum^T_{i=1} A_{t,h,i}\]

We flatten the $t,h$ dimensions to a 1D array. Use anytime_task_index to get the corresponding task index for plotting.

anytime_accuracy_all_avg: float#

The average of anytime_accuracy_all over all tasks.

\[\bar{a}_\text{any all} = \frac{1}{T}\sum_{t=1}^T \frac{1}{H}\sum_{h=1}^H a_\text{any all}(t, h)\]
anytime_accuracy_matrix: ndarray#

A matrix measuring the accuracy on each task after training on each task and step.

Is a ndarray of shape (n_tasks * n_continual_evaluations, n_tasks), dtype=np.float32.

This matrix is \(A\) with the first two dimensions flattened to a 2D array.

anytime_accuracy_seen: ndarray#

The accuracy on seen tasks after training on each step in each task.

\[a_\text{any seen}(t, h) = \frac{1}{t}\sum^t_{i=1} A_{t,h,i}\]

We flatten the $t,h$ dimensions to a 1D array. Use anytime_task_index to get the corresponding task index for plotting.

anytime_accuracy_seen_avg: float#

The average of anytime_accuracy_seen over all tasks.

\[\bar{a}_\text{any seen} = \frac{1}{T}\sum_{t=1}^T \frac{1}{H}\sum_{h=1}^H a_\text{any seen}(t, h)\]
anytime_task_index: ndarray#

The position in each task where the anytime accuracy was measured.

Is a ndarray of shape (n_tasks * n_continual_evaluations,), dtype=np.integer.

backward_transfer: float#

A scalar measuring the impact learning had on past tasks.

\[r_\text{BWT} = \frac{2}{T(T-1)} \sum_{i=2}^{T} \sum_{j=1}^{i-1} (R_{i,j} - R_{j,j})\]
boundaries: ndarray#

Instance index for the boundaries.

Used to map online evaluation to specific tasks.

Is a ndarray of shape (n_tasks + 1,), dtype=np.integer.

class_cm: ndarray#

A confusion matrix of shape (task, true_class, predicted_class).

forward_transfer: float#

A scalar measuring the impact learning had on future tasks.

\[r_\text{FWT} = \frac{2}{T(T-1)}\sum_{i=1}^{T} \sum_{j=i+1}^{T} R_{i,j}\]
n_classes: int#

The number of classes \(C\).

n_continual_evaluations: int#

The number of continual evaluations per task \(H\).

n_tasks: int#

The number of tasks \(T\).

task_index: ndarray#

The position of each task in the metrics.

ttt: PrequentialResults#

Test-then-train/prequential results.

ttt_windowed_task_index: ndarray#

The position of each window within each task.

Useful as the x axis for capymoa.evaluation.results.PrequentialResults.windowed.