Schema#
- class capymoa.stream.Schema[source]#
Bases:
objectSchema describes the structure of a stream.
It contains the attribute names, datatype, and the possible values for nominal attributes. The schema is crucial for a learner to know how to interpret instances correctly.
When working with datasets built into CapyMOA (see
capymoa.datasets) and ARFF files, the schema is automatically created. However, in some cases you might want to create a schema manually. This can be done using thefrom_custom()method.- __init__(
- moa_header: InstancesHeader,
Construct a schema by wrapping a
InstancesHeader.To create a schema without an
InstancesHeaderusefrom_custom()method.- Parameters:
moa_header – A Java MOA header object.
- static from_custom(
- features: Sequence[str],
- target: str,
- categories: Dict[str, Sequence[str]] | None = None,
- name: str = 'unnamed',
Create a CapyMOA Schema that defines each attribute in the stream.
The following example shows how to use this method to create a classification schema:
>>> from capymoa.stream import Schema >>> schema = Schema.from_custom( ... features=["f1", "f2", "class"], ... target="class", ... categories={"class": ["yes", "no"], "f1": ["low", "medium", "high"]}, ... name="classification-example" ... ) >>> print(schema) @relation classification-example @attribute f1 {low,medium,high} @attribute f2 numeric @attribute class {yes,no} @data >>> print(schema.is_classification()) True
The following example shows how to use this method to create a regression schema:
>>> schema = Schema.from_custom( ... features=["f1", "f2", "target"], ... target="target", ... categories={"f1": ["A", "B", "C"]}, ... name="regression-example" ... ) >>> print(schema) @relation regression-example @attribute f1 {A,B,C} @attribute f2 numeric @attribute target numeric @data >>> print(schema.is_regression()) True
- Parameters:
features – A list of feature names.
target – The name of the target attribute. Must be in features as well.
categories – A dictionary mapping feature names to their possible values. When the target attribute is included in this dictionary the task is considered classification.
name – The name of the dataset.
- Returns:
A CapyMOA Schema object.
- get_moa_header() InstancesHeader[source]#
Get the JAVA MOA header. Useful for advanced users.
This is needed for advanced operations that are not supported by the Python wrappers (yet).
- get_value_for_index(y_index: int | None) str | None[source]#
Return the value for the class label index y_index.
- is_y_index_in_range(y_index: int) bool[source]#
Return True if the y_index is in the range of the class label indexes.
- property dataset_name: str#
Returns the name of the dataset.
- property shape: Sequence[int]#
The shape of the input
xinstances.Usually
capymoa.instance.Instance.xis a vector but some learners need to know the shape of the input. For example, a CNN needs to know the height and width of an image.