TorchClassifyStream#

class capymoa.stream.TorchClassifyStream[source]#

Bases: Stream[LabeledInstance]

TorchClassifyStream turns a PyTorch dataset into a classification stream.

>>> from capymoa.evaluation import ClassificationEvaluator
...
>>> from capymoa.datasets import get_download_dir
>>> from capymoa.stream import TorchClassifyStream
>>> from torchvision import datasets
>>> from torchvision.transforms import ToTensor
>>> print("Using PyTorch Dataset"); pytorchDataset = datasets.FashionMNIST( 
...     root=get_download_dir(),
...     train=True,
...     download=True,
...     transform=ToTensor()
... )
Using PyTorch Dataset...
>>> pytorch_stream = TorchClassifyStream(pytorchDataset, 10, class_names=pytorchDataset.classes)
>>> pytorch_stream.get_schema()
@relation PytorchDataset

@attribute attrib_0 numeric
@attribute attrib_1 numeric
...
@attribute attrib_783 numeric
@attribute class {T-shirt/top,Trouser,Pullover,Dress,Coat,Sandal,Shirt,Sneaker,Bag,'Ankle boot'}

@data
>>> pytorch_stream.next_instance()
LabeledInstance(
    Schema(PytorchDataset),
    x=[0. 0. 0. ... 0. 0. 0.],
    y_index=9,
    y_label='Ankle boot'
)

You can construct TorchClassifyStream using a random sampler by passing a sampler to the constructor:

>>> import torch
>>> from torch.utils.data import RandomSampler, TensorDataset
>>> dataset = TensorDataset(
...     torch.tensor([[1], [2], [3]]), torch.tensor([0, 1, 2])
... )
>>> pytorch_stream = TorchClassifyStream(dataset=dataset, num_classes=3, shuffle=True)
>>> for instance in pytorch_stream:
...     print(instance.x)
[3]
[1]
[2]

Importantly you can restart the stream to iterate over the dataset in the same order again:

>>> pytorch_stream.restart()
>>> for instance in pytorch_stream:
...     print(instance.x)
[3]
[1]
[2]
__init__(
dataset: Dataset[Tuple[Tensor, LongTensor]],
num_classes: int,
shuffle: bool = False,
shuffle_seed: int = 0,
class_names: Sequence[str] | None = None,
dataset_name: str = 'PytorchDataset',
)[source]#

Create a stream from a PyTorch dataset.

Parameters:
  • dataset – A PyTorch dataset

  • num_classes – The number of classes in the dataset

  • shuffle – Randomly sample with replacement, defaults to False

  • shuffle_seed – Seed for shuffling, defaults to 0

  • class_names – The names of the classes, defaults to None

  • dataset_name – The name of the dataset, defaults to “PytorchDataset”

has_more_instances()[source]#

Return True if the stream have more instances to read.

next_instance()[source]#

Return the next instance in the stream.

Raises:

ValueError – If the machine learning task is neither a regression nor a classification task.

Returns:

A labeled instances or a regression depending on the schema.

get_schema()[source]#

Return the schema of the stream.

get_moa_stream()[source]#

Get the MOA stream object if it exists.

restart()[source]#

Restart the stream to read instances from the beginning.

CLI_help() str[source]#

Return a help message

__iter__() Iterator[_AnyInstance][source]#

Get an iterator over the stream.

This will NOT restart the stream if it has already been iterated over. Please use the restart() method to restart the stream.

Yield:

An iterator over the stream.

__next__() _AnyInstance[source]#

Get the next instance in the stream.

Returns:

The next instance in the stream.