CovtFD#

class capymoa.datasets.CovtFD[source]#

Bases: _DownloadableARFF

CovtFD is an adaptation from the classic Covtype classification problem with added feature drifts.

  • Number of instances: 581,011 (30m^2 cells)

  • Number of attributes: 104 (10 continuous, 44 categorical, 50 dummy)

  • Number of classes: 7 (forest cover types)

Given 30x30-meter cells obtained from the US Resource Information System (RIS). The dataset includes 10 continuous and 44 categorical features, which we augmented by adding 50 dummy continuous features drawn from a Normal probability distribution with μ = 0 and σ = 1. Only the continuous features were randomly swapped with 10 (out of the fifty) dummy features to simulate drifts. We added such synthetic drift twice, one at instance 193, 669 and another at 387, 338.

References:

  1. Gomes, Heitor Murilo, Rodrigo Fernandes de Mello, Bernhard Pfahringer, and Albert Bifet. “Feature scoring using tree-based ensembles for evolving data streams.” In 2019 IEEE International Conference on Big Data (Big Data), pp. 761-769. IEEE, 2019.

  2. Blackard,Jock. (1998). Covertype. UCI Machine Learning Repository. https://doi.org/10.24432/C50K5N.

  3. https://archive.ics.uci.edu/ml/datasets/Covertype

See Also:

  • Covtype - The classic covertype dataset

  • CovtypeNorm - A normalized version of the classic covertype dataset

  • CovtypeTiny - A truncated version of the classic covertype dataset

__init__(
directory: str | Path = get_download_dir(),
auto_download: bool = True,
)[source]#

Setup a stream from an ARFF file and optionally download it if missing.

Parameters:
__iter__() Iterator[_AnyInstance][source]#

Get an iterator over the stream.

This will NOT restart the stream if it has already been iterated over. Please use the restart() method to restart the stream.

Yield:

An iterator over the stream.

__next__() _AnyInstance[source]#

Get the next instance in the stream.

Returns:

The next instance in the stream.

cli_help() str[source]#

Return cli help string for the stream.

get_moa_stream() InstanceStream | None[source]#

Get the MOA stream object if it exists.

get_schema() Schema[source]#

Return the schema of the stream.

has_more_instances() bool[source]#

Return True if the stream have more instances to read.

next_instance() _AnyInstance[source]#

Return the next instance in the stream.

Raises:

ValueError – If the machine learning task is neither a regression nor a classification task.

Returns:

A labeled instances or a regression depending on the schema.

restart()[source]#

Restart the stream to read instances from the beginning.

classmethod to_stream(path: Path) InstanceStream[source]#

Convert the downloaded and unpacked dataset into a datastream.

schema: Schema#