stream#

Modules#

drift

Simulate concept drift in datastreams.

generator

Generate artificial data streams.

preprocessing

Classes#

Stream

A datastream that can be learnt instance by instance.

Schema

Schema describes the structure of a stream.

ARFFStream

A datastream originating from an ARFF file.

TorchClassifyStream

TorchClassifyStream turns a PyTorch dataset into a classification stream.

CSVStream

Create a CapyMOA datastream from a CSV file.

NumpyStream

A datastream originating from a numpy array.

MOAStream

A datastream that can be learnt instance by instance.

Functions#

capymoa.stream.stream_from_file(
path_to_csv_or_arff: str | Path,
dataset_name: str = 'NoName',
class_index: int = -1,
target_type: Literal['numeric', 'categorical'] | None = None,
) Stream[source]#

Create a datastream from a csv or arff file.

>>> from capymoa.stream import stream_from_file
>>> stream = stream_from_file(
...     "data/electricity_tiny.csv",
...     dataset_name="Electricity",
...     target_type="categorical"
... )
>>> stream.next_instance()
LabeledInstance(
    Schema(Electricity),
    x=[0.    0.056 0.439 0.003 0.423 0.415],
    y_index=1,
    y_label='1'
)
>>> stream.next_instance().x
array([0.021277, 0.051699, 0.415055, 0.003467, 0.422915, 0.414912])

CSV File Considerations:

  • Assumes a header row with attribute names.

  • Supports only numeric and categorical attributes.

  • String columns are automatically converted to categorical attributes. Convert them to numeric beforehand if needed.

  • The whole CSV file is read into memory. For very large files, use CSVStream directly for streaming from disk.

  • Default Target: The last column (class_index=-1) is assumed to be the target variable.

  • Missing Values: Represented by ?.

ARFF File Considerations:

Reads the Attribute-Relation File Format (ARFF) commonly used in MOA and WEKA.

  • Supports only NUMERIC and NOMINAL attributes.

  • Missing Values: Represented by ?.

  • Default Target: The last attribute is assumed to be the target variable.

Parameters:
  • path_to_csv_or_arff – A file path to a CSV or ARFF file.

  • dataset_name – A descriptive name given to the dataset, defaults to “NoName”

  • class_index – The index of the column containing the class label. By default, the algorithm assumes that the class label is located in the column specified by this index. However, if the class label is located in a different column, you can specify its index using this parameter.

  • target_type – When working with a CSV file, this parameter allows the user to specify the target values in the data to be interpreted as categorical or numeric. Defaults to None to detect automatically.