Schema#

class capymoa.stream.Schema[source]#

Bases: object

Schema describes the structure of a stream.

It contains the attribute names, datatype, and the possible values for nominal attributes. The schema is crucial for a learner to know how to interpret instances correctly.

When working with datasets built into CapyMOA (see capymoa.datasets) and ARFF files, the schema is automatically created. However, in some cases you might want to create a schema manually. This can be done using the from_custom() method.

__init__(
moa_header: InstancesHeader,
)[source]#

Construct a schema by wrapping a InstancesHeader.

To create a schema without an InstancesHeader use from_custom() method.

Parameters:

moa_header – A Java MOA header object.

get_label_values() Sequence[str][source]#

Return the possible values for the class label.

get_label_indexes() Sequence[int][source]#

Return the possible indexes for the class label.

get_value_for_index(y_index: int | None) str | None[source]#

Return the value for the class label index y_index.

get_index_for_label(y: str)[source]#

Return the index for the class label y.

get_moa_header() InstancesHeader[source]#

Get the JAVA MOA header. Useful for advanced users.

This is needed for advanced operations that are not supported by the Python wrappers (yet).

get_num_attributes() int[source]#

Return the number of attributes excluding the target attribute.

get_num_classes() int[source]#

Return the number of possible classes. If regression, returns 1.

is_regression() bool[source]#

Return True if the problem is a regression problem.

is_classification() bool[source]#

Return True if the problem is a classification problem.

is_y_index_in_range(y_index: int) bool[source]#

Return True if the y_index is in the range of the class label indexes.

property dataset_name: str#

Returns the name of the dataset.

static from_custom(
feature_names: Sequence[str],
values_for_nominal_features: Dict[str, Sequence[str]] = {},
values_for_class_label: Sequence[str] = None,
dataset_name='No_Name',
target_attribute_name=None,
target_type=None,
)[source]#

Create a CapyMOA Schema that defines each attribute in the stream.

The following example shows how to use this method to create a classification schema:

>>> from capymoa.stream import Schema
...
>>> Schema.from_custom(
...     feature_names=["attrib_1", "attrib_2"],
...     dataset_name="MyClassification",
...     target_attribute_name="class",
...     values_for_class_label=["yes", "no"])
@relation MyClassification

@attribute attrib_1 numeric
@attribute attrib_2 numeric
@attribute class {yes,no}

@data

The following example shows how to use this method to create a regression schema:

>>> Schema.from_custom(
...     feature_names=["attrib_1", "attrib_2"],
...     values_for_nominal_features={"attrib_1": ["a", "b"]},
...     dataset_name="MyRegression",
...     target_attribute_name="target",
...     target_type='numeric')
@relation MyRegression

@attribute attrib_1 {a,b}
@attribute attrib_2 numeric
@attribute target numeric

@data

Sample code to get relevant information from two Numpy arrays: X[rows][features] and y[rows]

Parameters:
  • feature_names – A list containing names of features. if none sets a default name.

  • values_for_nominal_features – Possible values of each nominal feature.

  • values_for_class_label – Possible values for class label. Values are turned into strings.

  • dataset_name – Name of the dataset. Default is “No_Name”.

  • target_attribute_name – Name of the target/class attribute. Default is None.

  • target_type – Set the target type as ‘categorical’ or ‘numeric’, None to detect automatically.

Return CayMOA Schema:

Initialized CapyMOA Schema which contain all necessary attribute information for all features and the class label