datasets#

CapyMOA comes with some datasets ‘out of the box’. Simply import the dataset and start using it, the data will be downloaded automatically if it is not already present in the download directory. You can configure where the datasets are downloaded to by setting an environment variable (See capymoa.env)

>>> from capymoa.datasets import ElectricityTiny
>>> stream = ElectricityTiny()
>>> stream.next_instance().x
array([0.      , 0.056443, 0.439155, 0.003467, 0.422915, 0.414912])

Alternatively, you may download the datasets all at once with the command line interface provided by capymoa.datasets:

python -m capymoa.datasets --help

Modules#

Classes#

Bike

Bike is a regression dataset for the amount of bike share information.

CovtFD

CovtFD is an adaptation from the classic Covtype classification problem with added feature drifts.

Covtype

The classic covertype (/covtype) classification problem

CovtypeNorm

A normalized version of the classic Covtype classification problem.

CovtypeTiny

A truncated version of the classic Covtype classification problem.

Electricity

Electricity is a classification problem based on the Australian New South Wales Electricity Market.

ElectricityTiny

A truncated version of the Electricity dataset with 1000 instances.

Fried

Fried is a regression problem based on the Friedman dataset.

FriedTiny

A truncated version of the Friedman regression problem with 1000 instances.

Hyper100k

Hyper100k is a classification problem based on the moving hyperplane generator.

RBFm_100k

RBFm_100k is a synthetic classification problem based on the Radial Basis Function generator.

RTG_2abrupt

RTG_2abrupt is a synthetic classification problem based on the Random Tree generator with 2 abrupt drifts.

Sensor

Sensor stream is a classification problem based on indoor sensor data.

Functions#

capymoa.datasets.get_download_dir(download_dir: str | None = None) Path[source]#

Get a directory where datasets should be downloaded to.

The download directory is determined by the following steps:

  1. If the download_dir parameter is provided, use that.

  2. If the CAPYMOA_DATASETS_DIR environment variable is set, use that.

  3. Otherwise, use the default download directory: ./data.

Parameters:

download_dir – Override the download directory.

Returns:

The download directory.