RTG_2abrupt#

class capymoa.datasets.RTG_2abrupt[source]#

Bases: DownloadARFFGzip

RTG_2abrupt is a synthetic classification problem based on the Random Tree generator with 2 abrupt drifts.

Number of instances: 100,000
Number of attributes: 30
Number of classes: 5
generators.RandomTreeGenerator -o 0 -u 30 -d 20

This is a snapshot (100k instances with 2 simulated abrupt drifts) of the synthetic generator based on the one proposed by Domingos and Hulten [1], producing concepts that in theory should favour decision tree learners. It constructs a decision tree by choosing attributes at random to split, and assigning a random class label to each leaf. Once the tree is built, new examples are generated by assigning uniformly distributed random values to attributes which then determine the class label via the tree.

References:

Domingos, Pedro, and Geoff Hulten. “Mining high-speed data streams.” In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 71-80. 2000.

CLI_help() → str[source]#: Return cli help string for the stream.

__init__( directory: str = get_download_dir(), auto_download: bool = True, CLI: str | None = None, schema: str | None = None, )[source]#

Construct a Stream from a MOA stream object.

Usually, you will want to construct a Stream using the capymoa.stream.stream_from_file() function.

Parameters:

moa_stream – The MOA stream object to read instances from. Is None if the stream is created from a numpy array.
schema – The schema of the stream. If None, the schema is inferred from the moa_stream.
CLI – Additional command line arguments to pass to the MOA stream.

Raises:

ValueError – If no schema is provided and no moa_stream is provided.
ValueError – If command line arguments are provided without a moa_stream.

__iter__() → Iterator[_AnyInstance][source]#

Get an iterator over the stream.

This will NOT restart the stream if it has already been iterated over. Please use the restart() method to restart the stream.

Yield:: An iterator over the stream.

__next__() → _AnyInstance[source]#

Get the next instance in the stream.

Returns:: The next instance in the stream.

download(working_directory: Path) → Path[source]#

Download the dataset and return the path to the downloaded dataset within the working directory.

Parameters:: working_directory – The directory to download the dataset to.
Returns:: The path to the downloaded dataset within the working directory.

extract(stream_archive: Path) → Path[source]#

Extract the dataset from the archive and return the path to the extracted dataset.

Parameters:: stream_archive – The path to the archive containing the dataset.
Returns:: The path to the extracted dataset.

get_moa_stream() → InstanceStream | None[source]#: Get the MOA stream object if it exists.

get_path()[source]#

get_schema() → Schema[source]#: Return the schema of the stream.

has_more_instances() → bool[source]#: Return True if the stream have more instances to read.

next_instance() → _AnyInstance[source]#

Return the next instance in the stream.

Raises:: ValueError – If the machine learning task is neither a regression nor a classification task.
Returns:: A labeled instances or a regression depending on the schema.

restart()[source]#: Restart the stream to read instances from the beginning.

to_stream(stream: Path) → Any[source]#

Convert the dataset to a MOA stream.

Parameters:: stream – The path to the dataset.
Returns:: A MOA stream.

RTG_2abrupt#

This Page