DownloadableDataset#
- class capymoa.datasets.downloader.DownloadableDataset[source]#
Bases:
MOAStream
,ABC
- __init__(
- directory: str = PosixPath('data'),
- auto_download: bool = True,
- CLI: str | None = None,
- schema: str | None = None,
Construct a Stream from a MOA stream object.
Usually, you will want to construct a Stream using the
capymoa.stream.stream_from_file()
function.- Parameters:
moa_stream – The MOA stream object to read instances from. Is None if the stream is created from a numpy array.
schema – The schema of the stream. If None, the schema is inferred from the moa_stream.
CLI – Additional command line arguments to pass to the MOA stream.
- Raises:
ValueError – If no schema is provided and no moa_stream is provided.
ValueError – If command line arguments are provided without a moa_stream.
- abstract download(working_directory: Path) Path [source]#
Download the dataset and return the path to the downloaded dataset within the working directory.
- Parameters:
working_directory – The directory to download the dataset to.
- Returns:
The path to the downloaded dataset within the working directory.
- abstract extract(stream_archive: Path) Path [source]#
Extract the dataset from the archive and return the path to the extracted dataset.
- Parameters:
stream_archive – The path to the archive containing the dataset.
- Returns:
The path to the extracted dataset.
- abstract to_stream(stream: Path)[source]#
Convert the dataset to a MOA stream.
- Parameters:
stream – The path to the dataset.
- Returns:
A MOA stream.
- __iter__() Iterator[_AnyInstance] [source]#
Get an iterator over the stream.
This will NOT restart the stream if it has already been iterated over. Please use the
restart()
method to restart the stream.- Yield:
An iterator over the stream.
- __next__() _AnyInstance [source]#
Get the next instance in the stream.
- Returns:
The next instance in the stream.