stream#
Modules#
Simulate concept drift in datastreams. |
|
Generate artificial data streams. |
|
Classes#
A datastream that can be learnt instance by instance. |
|
Schema describes the structure of a stream. |
|
A datastream originating from an ARFF file. |
|
TorchClassifyStream turns a PyTorch dataset into a classification stream. |
|
Create a CapyMOA datastream from a CSV file. |
|
A datastream originating from a numpy array. |
|
A datastream that can be learnt instance by instance. |
Functions#
- capymoa.stream.stream_from_file(
- path_to_csv_or_arff: str | Path,
- dataset_name: str = 'NoName',
- class_index: int = -1,
- target_type: Literal['numeric', 'categorical'] | None = None,
Create a datastream from a csv or arff file.
>>> from capymoa.stream import stream_from_file >>> stream = stream_from_file( ... "data/electricity_tiny.csv", ... dataset_name="Electricity", ... target_type="categorical" ... ) >>> stream.next_instance() LabeledInstance( Schema(Electricity), x=[0. 0.056 0.439 0.003 0.423 0.415], y_index=1, y_label='1' ) >>> stream.next_instance().x array([0.021277, 0.051699, 0.415055, 0.003467, 0.422915, 0.414912])
CSV File Considerations:
Assumes a header row with attribute names.
Supports only numeric and categorical attributes.
String columns are automatically converted to categorical attributes. Convert them to numeric beforehand if needed.
The whole CSV file is read into memory. For very large files, use
CSVStreamdirectly for streaming from disk.Default Target: The last column (class_index=-1) is assumed to be the target variable.
Missing Values: Represented by
?.
ARFF File Considerations:
Reads the Attribute-Relation File Format (ARFF) commonly used in MOA and WEKA.
Supports only
NUMERICandNOMINALattributes.Missing Values: Represented by
?.Default Target: The last attribute is assumed to be the target variable.
- Parameters:
path_to_csv_or_arff – A file path to a CSV or ARFF file.
dataset_name – A descriptive name given to the dataset, defaults to “NoName”
class_index – The index of the column containing the class label. By default, the algorithm assumes that the class label is located in the column specified by this index. However, if the class label is located in a different column, you can specify its index using this parameter.
target_type – When working with a CSV file, this parameter allows the user to specify the target values in the data to be interpreted as categorical or numeric. Defaults to None to detect automatically.