stream#

Modules#

drift

Simulate concept drift in datastreams.

generator

Generate artificial data streams.

preprocessing

Classes#

Stream

A datastream that can be learnt instance by instance.

Schema

Schema describes the structure of a stream.

ARFFStream

A datastream originating from an ARFF file.

PytorchStream

PytorchStream turns a PyTorch dataset into a datastream.

CSVStream

NumpyStream

A datastream originating from a numpy array.

Functions#

capymoa.stream.stream_from_file(
path_to_csv_or_arff: str,
dataset_name: str = 'NoName',
class_index: int = -1,
target_type: str = None,
) Stream[source]#

Create a datastream from a csv or arff file.

>>> from capymoa.stream import stream_from_file
>>> stream = stream_from_file("data/electricity_tiny.csv", dataset_name="Electricity")
>>> stream.next_instance()
LabeledInstance(
    Schema(Electricity),
    x=ndarray(..., 6),
    y_index=1,
    y_label='1'
)
>>> stream.next_instance().x
array([0.021277, 0.051699, 0.415055, 0.003467, 0.422915, 0.414912])
Parameters:
  • path_to_csv_or_arff – A file path to a CSV or ARFF file.

  • dataset_name – A descriptive name given to the dataset, defaults to “NoName”

  • class_index – The index of the column containing the class label. By default, the algorithm assumes that the class label is located in the column specified by this index. However, if the class label is located in a different column, you can specify its index using this parameter.

  • target_type – When working with a CSV file, this parameter allows the user to specify the target values in the data to be interpreted as categorical or numeric. Defaults to None to detect automatically.