CovtypeTiny#

class capymoa.datasets.CovtypeTiny[source]#

Bases: DownloadARFFGzip

A truncated version of the classic Covtype classification problem.

This should only be used for quick tests, not for benchmarking algorithms.

  • Number of instances: first 1001 (30m^2 cells)

  • Number of attributes: 54 (10 continuous, 44 categorical)

  • Number of classes: 7 (forest cover types)

Forest Covertype (or simply covtype) contains the forest cover type for 30 x 30 meter cells obtained from US Forest Service (USFS) Region 2 Resource Information System (RIS) data.

References:

  1. Blackard,Jock. (1998). Covertype. UCI Machine Learning Repository. https://doi.org/10.24432/C50K5N.

  2. https://archive.ics.uci.edu/ml/datasets/Covertype

See Also:

  • CovtFD - Covtype with simulated feature drifts

  • Covtype - The classic covertype dataset

  • CovtypeNorm - A normalized version of the classic covertype dataset

CLI_help() str[source]#

Return cli help string for the stream.

__init__(
directory: str = PosixPath('data'),
auto_download: bool = True,
CLI: str | None = None,
schema: str | None = None,
)[source]#
download(working_directory: Path) Path[source]#

Download the dataset and return the path to the downloaded dataset within the working directory.

Parameters:

working_directory – The directory to download the dataset to.

Returns:

The path to the downloaded dataset within the working directory.

extract(stream_archive: Path) Path[source]#

Extract the dataset from the archive and return the path to the extracted dataset.

Parameters:

stream_archive – The path to the archive containing the dataset.

Returns:

The path to the extracted dataset.

get_moa_stream() InstanceStream | None[source]#

Get the MOA stream object if it exists.

get_path()[source]#
get_schema() Schema[source]#

Return the schema of the stream.

has_more_instances() bool[source]#

Return True if the stream have more instances to read.

next_instance() LabeledInstance | RegressionInstance[source]#

Return the next instance in the stream.

Raises:

ValueError – If the machine learning task is neither a regression nor a classification task.

Returns:

A labeled instances or a regression depending on the schema.

restart()[source]#

Restart the stream to read instances from the beginning.

to_stream(stream: Path) Any[source]#

Convert the dataset to a MOA stream.

Parameters:

stream – The path to the dataset.

Returns:

A MOA stream.