RBFm_100k#
- class capymoa.datasets.RBFm_100k[source]#
Bases:
DownloadARFFGzip
RBFm_100k is a synthetic classification problem based on the Radial Basis Function generator.
Number of instances: 100,000
Number of attributes: 10
generators.RandomRBFGeneratorDrift -s 1.0E-4 -c 5
This is a snapshot (100k instances) of the synthetic generator RBF (Radial Basis Function), which works as follows: A fixed number of random centroids are generated. Each center has a random position, a single standard deviation, class label and weight. New examples are generated by selecting a center at random, taking weights into consideration so that centers with higher weight are more likely to be chosen. A random direction is chosen to offset the attribute values from the central point. The length of the displacement is randomly drawn from a Gaussian distribution with standard deviation determined by the chosen centroid. The chosen centroid also determines the class label of the example. This effectively creates a normally distributed hypersphere of examples surrounding each central point with varying densities. Only numeric attributes are generated.
- __init__(
- directory: str = PosixPath('data'),
- auto_download: bool = True,
- CLI: str | None = None,
- schema: str | None = None,
- download(working_directory: Path) Path [source]#
Download the dataset and return the path to the downloaded dataset within the working directory.
- Parameters:
working_directory – The directory to download the dataset to.
- Returns:
The path to the downloaded dataset within the working directory.
- extract(stream_archive: Path) Path [source]#
Extract the dataset from the archive and return the path to the extracted dataset.
- Parameters:
stream_archive – The path to the archive containing the dataset.
- Returns:
The path to the extracted dataset.
- next_instance() LabeledInstance | RegressionInstance [source]#
Return the next instance in the stream.
- Raises:
ValueError – If the machine learning task is neither a regression nor a classification task.
- Returns:
A labeled instances or a regression depending on the schema.