Schema#

class capymoa.stream.Schema[source]#

Bases: object

Schema describes the structure of a stream.

It contains the attribute names, datatype, and the possible values for nominal attributes. The schema is crucial for a learner to know how to interpret instances correctly.

When working with datasets built into CapyMOA (see capymoa.datasets) and ARFF files, the schema is automatically created. However, in some cases you might want to create a schema manually. This can be done using the from_custom() method.

__init__(
moa_header: InstancesHeader,
)[source]#

Construct a schema by wrapping a InstancesHeader.

To create a schema without an InstancesHeader use from_custom() method.

Parameters:

moa_header – A Java MOA header object.

static from_custom(
features: Sequence[str],
target: str,
categories: Dict[str, Sequence[str]] | None = None,
name: str = 'unnamed',
)[source]#

Create a CapyMOA Schema that defines each attribute in the stream.

The following example shows how to use this method to create a classification schema:

>>> from capymoa.stream import Schema
>>> schema = Schema.from_custom(
...     features=["f1", "f2", "class"],
...     target="class",
...     categories={"class": ["yes", "no"], "f1": ["low", "medium", "high"]},
...     name="classification-example"
... )
>>> print(schema)
@relation classification-example

@attribute f1 {low,medium,high}
@attribute f2 numeric
@attribute class {yes,no}

@data
>>> print(schema.is_classification())
True

The following example shows how to use this method to create a regression schema:

>>> schema = Schema.from_custom(
...     features=["f1", "f2", "target"],
...     target="target",
...     categories={"f1": ["A", "B", "C"]},
...     name="regression-example"
... )
>>> print(schema)
@relation regression-example

@attribute f1 {A,B,C}
@attribute f2 numeric
@attribute target numeric

@data
>>> print(schema.is_regression())
True
Parameters:
  • features – A list of feature names.

  • target – The name of the target attribute. Must be in features as well.

  • categories – A dictionary mapping feature names to their possible values. When the target attribute is included in this dictionary the task is considered classification.

  • name – The name of the dataset.

Returns:

A CapyMOA Schema object.

get_index_for_label(y: str)[source]#

Return the index for the class label y.

get_label_indexes() Sequence[int][source]#

Return the possible indexes for the class label.

get_label_values() Sequence[str][source]#

Return the possible values for the class label.

get_moa_header() InstancesHeader[source]#

Get the JAVA MOA header. Useful for advanced users.

This is needed for advanced operations that are not supported by the Python wrappers (yet).

get_nominal_attributes() Dict[str, Sequence[str]][source]#

Return a dict of nominal attributes.

get_num_attributes() int[source]#

Return the number of attributes excluding the target attribute.

get_num_classes() int[source]#

Return the number of possible classes. If regression, returns 1.

get_num_nominal_attributes() int[source]#

Return the number of nominal attributes.

get_num_numeric_attributes() int[source]#

Return the number of numeric attributes.

get_numeric_attributes() Sequence[str][source]#

Return a list of numeric attribute names.

get_value_for_index(y_index: int | None) str | None[source]#

Return the value for the class label index y_index.

is_classification() bool[source]#

Return True if the problem is a classification problem.

is_regression() bool[source]#

Return True if the problem is a regression problem.

is_y_index_in_range(y_index: int) bool[source]#

Return True if the y_index is in the range of the class label indexes.

property dataset_name: str#

Returns the name of the dataset.

property shape: Sequence[int]#

The shape of the input x instances.

Usually capymoa.instance.Instance.x is a vector but some learners need to know the shape of the input. For example, a CNN needs to know the height and width of an image.