TorchStream#
- class capymoa.stream.TorchStream[source]#
Bases:
StreamA stream adapter for PyTorch datasets.
This class converts PyTorch datasets into CapyMOA streams for both classification and regression tasks.
Creating a classification stream from a PyTorch dataset:
>>> from capymoa.datasets import get_download_dir >>> from capymoa.stream import TorchStream >>> from torchvision import datasets, transforms >>> >>> dataset = datasets.FashionMNIST( ... root=get_download_dir(), ... train=True, ... download=True, ... transform=transforms.ToTensor() ... ) >>> stream = TorchStream.from_classification( ... dataset, num_classes=10, class_names=dataset.classes ... ) >>> stream.next_instance() LabeledInstance(...)
Creating a shuffled classification stream:
>>> import torch >>> from torch.utils.data import TensorDataset >>> >>> dataset = TensorDataset( ... torch.tensor([[1.0], [2.0], [3.0]]), ... torch.tensor([0, 1, 2]) ... ) >>> stream = TorchStream.from_classification( ... dataset, num_classes=3, shuffle=True, shuffle_seed=0 ... ) >>> [float(inst.x[0]) for inst in stream] [3.0, 1.0, 2.0]
Streams can be restarted to iterate again:
>>> stream.restart() >>> [float(inst.x[0]) for inst in stream] [3.0, 1.0, 2.0]
Creating a regression stream:
>>> dataset = TensorDataset( ... torch.tensor([[1.0], [2.0], [3.0]]), ... torch.tensor([0.5, 1.5, 2.5]) ... ) >>> stream = TorchStream.from_regression( ... dataset, shuffle=True, shuffle_seed=0 ... ) >>> [(float(inst.x[0]), float(inst.y_value)) for inst in stream] [(3.0, 2.5), (1.0, 0.5), (2.0, 1.5)]
- __init__( )[source]#
Construct a TorchStream from a PyTorch Dataset and a Schema.
Usually you want
from_classification()orfrom_regression().- Parameters:
dataset – A PyTorch Dataset that yields tuples of (features, target).
schema – A Schema object that describes the structure of the data, including feature names and target information.
- __iter__() Iterator[_AnyInstance][source]#
Get an iterator over the stream.
This will NOT restart the stream if it has already been iterated over. Please use the
restart()method to restart the stream.- Yield:
An iterator over the stream.
- __next__() _AnyInstance[source]#
Get the next instance in the stream.
- Returns:
The next instance in the stream.
- static from_classification(
- dataset: Dataset[Tuple[Tensor, Tensor | int]],
- num_classes: int,
- class_names: Sequence[str] | None = None,
- dataset_name: str = 'TorchStream',
- shape: Sequence[int] | None = None,
- shuffle: bool = False,
- shuffle_seed: int | None = None,
Construct a stream for classification from a PyTorch Dataset.
- Parameters:
dataset – A PyTorch Dataset that yields tuples of (features, target).
num_classes – The number of classes in the classification task.
class_names – An optional sequence of class names corresponding to the class indices.
dataset_name – An optional name for the stream.
shape – An optional shape for the features. If not provided, features will be treated as flat vectors.
shuffle – Whether to shuffle the dataset.
shuffle_seed – An optional seed for shuffling the dataset.
- Returns:
A TorchStream instance.
- static from_regression(
- dataset: Dataset[Tuple[Tensor, Tensor | float]],
- dataset_name: str = 'TorchStream',
- shuffle: bool = False,
- shuffle_seed: int | None = None,
Construct a stream for regression from a PyTorch Dataset.
- Parameters:
dataset – A PyTorch Dataset that yields tuples of (features, target) for regression tasks.
dataset_name – An optional name for the stream.
shape – An optional shape for the features. If not provided, features will be treated as flat vectors.
shuffle – Whether to shuffle the dataset.
shuffle_seed – An optional seed for shuffling the dataset.
- Returns:
A TorchStream instance for regression.