CSVStream#
- class capymoa.stream.CSVStream[source]#
Bases:
Stream[_AnyInstance]Create a CapyMOA datastream from a CSV file.
The CSV file must have a header row with feature names.
Integers or strings can specify nominal features.
?represent missing values.CSV is read line by line, so it can handle large files.
When ‘categories’ are provided for the target attribute, then the stream returns
LabeledInstanceobjects.>>> from io import StringIO >>> from capymoa.stream import CSVStream >>> csv_content = '''feature1,feature2,target ... 1,A,yes ... 2,B,no ... 3,0,0 ... 5,1,1 ... ?,?,? ... ''' >>> csv_file = StringIO(csv_content) >>> stream = CSVStream( ... file=csv_file, ... target="target", ... categories={"target": ["yes", "no"], "feature2": ["A", "B"]}, ... name="TestStream" ... ) >>> for instance in stream: ... print(instance.x, instance.y_index, instance.y_label) [1. 0.] 0 yes [2. 1.] 1 no [3. 0.] 0 yes [5. 1.] 1 no [nan nan] -1 None
When no categories are provided for the target attribute, then the stream returns
RegressionInstanceobjects.>>> csv_content = '''target,feature1,feature2 ... 0.0,A,1 ... 0.5,B,2 ... 1.5,0,3 ... 2.0,1,4 ... ?,?,? ... ''' >>> csv_file = StringIO(csv_content) >>> stream = CSVStream( ... file=csv_file, ... target="target", ... categories={"feature1": ["A", "B"]}, ... name="TestStream" ... ) >>> for instance in stream: ... print(instance.x, instance.y_value) [0. 1.] 0.0 [1. 2.] 0.5 [0. 3.] 1.5 [1. 4.] 2.0 [nan nan] nan
- __init__(
- file: Path | str | TextIO,
- target: str,
- categories: Mapping[str, Sequence[str]] | None = None,
- name: str | None = None,
- length: int | None = None,
Create a CSV stream.
- Parameters:
file – A path to a CSV file or an open file-like object.
target – The name of the target attribute.
categories – A mapping from attribute names to their categorical values.
name – An optional name for the stream. If not provided, the filename is used.
length – An optional length of the stream (number of instances). If provided, this enables the
Sizedinterface.
- __iter__() Iterator[_AnyInstance][source]#
Get an iterator over the stream.
This will NOT restart the stream if it has already been iterated over. Please use the
restart()method to restart the stream.- Yield:
An iterator over the stream.
- __next__() _AnyInstance[source]#
Get the next instance in the stream.
- Returns:
The next instance in the stream.