datasets#
CapyMOA comes with some datasets ‘out of the box’. Simply import the dataset
and start using it, the data will be downloaded automatically if it is not
already present in the download directory. You can configure where the datasets
are downloaded to by setting an environment variable (See capymoa.env
)
>>> from capymoa.datasets import ElectricityTiny
>>> stream = ElectricityTiny()
>>> stream.next_instance().x
array([0. , 0.056443, 0.439155, 0.003467, 0.422915, 0.414912])
Alternatively, you may download the datasets all at once with the command line interface
provided by capymoa.datasets
:
python -m capymoa.datasets --help
Modules#
Classes#
Sensor stream is a classification problem based on indoor sensor data. |
|
RTG_2abrupt is a synthetic classification problem based on the Random Tree generator with 2 abrupt drifts. |
|
RBFm_100k is a synthetic classification problem based on the Radial Basis Function generator. |
|
Hyper100k is a classification problem based on the moving hyperplane generator. |
|
Fried is a regression problem based on the Friedman dataset. |
|
A truncated version of the Electricity dataset with 1000 instances. |
|
Electricity is a classification problem based on the Australian New South Wales Electricity Market. |
|
A truncated version of the classic |
|
A normalized version of the classic |
|
The classic covertype (/covtype) classification problem |
|
CovtFD is an adaptation from the classic |
|
Bike is a regression dataset for the amount of bike share information. |
Functions#
- capymoa.datasets.get_download_dir(download_dir: str | None = None) Path [source]#
Get a directory where datasets should be downloaded to.
The download directory is determined by the following steps:
If the
download_dir
parameter is provided, use that.If the
CAPYMOA_DATASETS_DIR
environment variable is set, use that.Otherwise, use the default download directory:
./data
.
- Parameters:
download_dir – Override the download directory.
- Returns:
The download directory.