datasets#

CapyMOA comes with some datasets ‘out of the box’. Simply import the dataset and start using it, the data will be downloaded automatically if it is not already present in the download directory. You can configure where the datasets are downloaded to by setting an environment variable (See capymoa.env)

>>> from capymoa.datasets import ElectricityTiny
>>> stream = ElectricityTiny()
>>> stream.next_instance().x
array([0.      , 0.056443, 0.439155, 0.003467, 0.422915, 0.414912])

Alternatively, you may download the datasets all at once with the command line interface provided by capymoa.datasets:

python -m capymoa.datasets --help

Modules#

Classes#

Sensor

Sensor stream is a classification problem based on indoor sensor data.

RTG_2abrupt

RTG_2abrupt is a synthetic classification problem based on the Random Tree generator with 2 abrupt drifts.

RBFm_100k

RBFm_100k is a synthetic classification problem based on the Radial Basis Function generator.

Hyper100k

Hyper100k is a classification problem based on the moving hyperplane generator.

Fried

Fried is a regression problem based on the Friedman dataset.

ElectricityTiny

A truncated version of the Electricity dataset with 1000 instances.

Electricity

Electricity is a classification problem based on the Australian New South Wales Electricity Market.

CovtypeTiny

A truncated version of the classic Covtype classification problem.

CovtypeNorm

A normalized version of the classic Covtype classification problem.

Covtype

The classic covertype (/covtype) classification problem

CovtFD

CovtFD is an adaptation from the classic Covtype classification problem with added feature drifts.

Bike

Bike is a regression dataset for the amount of bike share information.

Functions#

capymoa.datasets.get_download_dir(download_dir: str | None = None) Path[source]#

Get a directory where datasets should be downloaded to.

The download directory is determined by the following steps:

  1. If the download_dir parameter is provided, use that.

  2. If the CAPYMOA_DATASETS_DIR environment variable is set, use that.

  3. Otherwise, use the default download directory: ./data.

Parameters:

download_dir – Override the download directory.

Returns:

The download directory.