Instance#

Instances are the basic unit of data in CapyMOA.

class capymoa.instance.Instance[source]#

Bases: object

An instance is a single data point in a stream. It contains a feature vector and a schema that describes the datastream it belongs to.

In supervised learning, your more likely to encounter LabeledInstance or RegressionInstance which are subclasses of Instance with a class label or target value respectively.

__init__(
schema: Schema,
instance: InstanceExample | ndarray[Any, dtype[float64]],
) None[source]#

Creates a new instance.

Its recommended that you prefer using from_array() or from_java_instance() to create instances, as they provide a more user-friendly interface.

Parameters:
  • schema – A schema that describes the datastream the instance belongs to.

  • instance – A vector of features (float values) or a Java instance.

Raises:

ValueError – If the given instance type is of an unsupported type.

classmethod from_java_instance(
schema: Schema,
java_instance: InstanceExample,
) Instance[source]#
classmethod from_array(
schema: Schema,
instance: ndarray[Any, dtype[float64]],
) Instance[source]#

A class constructor to create an instance from a schema and a vector of features.

This is useful in the rare cases you need to create custom unlabeled instances from scratch. In most cases, your datastream will automatically create instances for you.

>>> from capymoa.stream import Schema
...
>>> from capymoa.instance import Instance
>>> import numpy as np
>>> schema = Schema.from_custom(
...     ["f1", "f2"],
...     dataset_name="CustomDataset",
...     values_for_class_label=["yes", "no"]
... )
>>> x = np.array([0.1, 0.2])
>>> instance = Instance.from_array(schema, x)
>>> instance
Instance(
    Schema(CustomDataset),
    x=ndarray(..., 2)
)
Parameters:
  • schema – A schema that describes the datastream the instance belongs to.

  • instance – A vector (numpy.ndarray) of features (float values

Returns:

A new Instance object

property schema: Schema#

Returns the schema of the instance and the stream it belongs to.

property x: ndarray[Any, dtype[float64]]#

Returns a feature vector containing float values for the instance.

property java_instance: InstanceExample#

Returns a representation of the instance in Java for use in MOA. This method is for advanced users who want to directly interact with MOA’s Java API.

class capymoa.instance.LabeledInstance[source]#

Bases: Instance

An Instance with a class label.

Most classification datastreams will automatically return instances for you with the class label and index. For example, the capymoa.datasets.ElectricityTiny dataset:

>>> from capymoa.datasets import ElectricityTiny
...
>>> from capymoa.instance import LabeledInstance
>>> stream = ElectricityTiny()
>>> instance: LabeledInstance = stream.next_instance()
>>> instance.y_label
'1'

The label and index are NOT the same. One is a human-readable string and the other is a integer representation of the class label. >>> instance.y_index 1 >>> instance.x array([0. , 0.056443, 0.439155, 0.003467, 0.422915, 0.414912])

__init__(
schema: Schema,
instance: InstanceExample | Tuple[ndarray[Any, dtype[float64]], int],
) None[source]#

Creates a new instance.

Its recommended that you prefer using from_array() or from_java_instance() to create instances, as they provide a more user-friendly interface.

Parameters:
  • schema – A schema that describes the datastream the instance belongs to.

  • instance – A vector of features (float values) or a Java instance.

Raises:

ValueError – If the given instance type is of an unsupported type.

classmethod from_array(
schema: Schema,
x: ndarray[Any, dtype[float64]],
y_index: int,
) LabeledInstance[source]#

Creates a new labeled instance from a schema, feature vector, and class index.

This is useful in the rare cases you need to create custom labeled instances from scratch. In most cases, your datastream will automatically create instances for you.

>>> from capymoa.stream import Schema
...
>>> from capymoa.instance import LabeledInstance
>>> import numpy as np
>>> schema = Schema.from_custom(
...     ["f1", "f2"],
...     dataset_name="CustomDataset",
...     values_for_class_label=["yes", "no"]
... )
>>> x = np.array([0.1, 0.2])
>>> instance = LabeledInstance.from_array(schema, x, 0)
>>> instance
LabeledInstance(
    Schema(CustomDataset),
    x=ndarray(..., 2),
    y_index=0,
    y_label='yes'
)
>>> instance.y_label
'yes'
>>> instance.java_instance.toString()
'0.1,0.2,yes,'
Parameters:
  • schema – _description_

  • x – _description_

  • y_index – _description_

Returns:

_description_

property y_label: str#

Returns the class label of the instance as a string.

property y_index: int#

Returns the index of the class. It is useful for classification tasks as it provides a numeric representation of the class label, ranging from zero to the number of classes.

classmethod from_java_instance(
schema: Schema,
java_instance: InstanceExample,
) Instance[source]#
property java_instance: InstanceExample#

Returns a representation of the instance in Java for use in MOA. This method is for advanced users who want to directly interact with MOA’s Java API.

property schema: Schema#

Returns the schema of the instance and the stream it belongs to.

property x: ndarray[Any, dtype[float64]]#

Returns a feature vector containing float values for the instance.

class capymoa.instance.RegressionInstance[source]#

Bases: Instance

An Instance with a continuous target value.

Most of the time, regression datastreams will automatically return instances for you with the target value. For example, the capymoa.datasets.Fried dataset:

>>> from capymoa.datasets import Fried
...
>>> from capymoa.instance import RegressionInstance
>>> stream = Fried()
>>> instance: RegressionInstance = stream.next_instance()
>>> instance.y_value
17.949
>>> instance.x
array([0.487, 0.072, 0.004, 0.833, 0.765, 0.6  , 0.132, 0.886, 0.073,
       0.342])
__init__(
schema: Schema,
instance: InstanceExample | Tuple[ndarray[Any, dtype[float64]], float64],
) None[source]#

Creates a new instance.

Its recommended that you prefer using from_array() or from_java_instance() to create instances, as they provide a more user-friendly interface.

Parameters:
  • schema – A schema that describes the datastream the instance belongs to.

  • instance – A vector of features (float values) or a Java instance.

Raises:

ValueError – If the given instance type is of an unsupported type.

classmethod from_array(
schema: Schema,
x: ndarray[Any, dtype[float64]],
y_value: float64,
) RegressionInstance[source]#

Creates a new regression instance from a schema, feature vector, and target value.

This is useful in the rare cases you need to create custom regression instances from scratch. In most cases, your datastream will automatically create these for you.

>>> from capymoa.stream import Schema
...
>>> from capymoa.instance import LabeledInstance
>>> import numpy as np
>>> schema = Schema.from_custom(
...     ["f1", "f2"],
...     dataset_name="CustomDataset",
...     enforce_regression=True
... )
>>> x = np.array([0.1, 0.2])
>>> instance = RegressionInstance.from_array(schema, x, 0.5)
>>> instance
RegressionInstance(
    Schema(CustomDataset),
    x=ndarray(..., 2),
    y_value=0.5
)
>>> instance.y_value
0.5
>>> instance.java_instance.toString()
'0.1,0.2,0.5,'
Parameters:
  • schema – A schema describing the datastream the instance belongs to.

  • x – A vector of features numpy.ndarray containing float values.

  • y_value – A float value representing the target value or dependent variable.

Returns:

A new RegressionInstance object.

classmethod from_java_instance(
schema: Schema,
java_instance: InstanceExample,
) Instance[source]#
property java_instance: InstanceExample#

Returns a representation of the instance in Java for use in MOA. This method is for advanced users who want to directly interact with MOA’s Java API.

property schema: Schema#

Returns the schema of the instance and the stream it belongs to.

property x: ndarray[Any, dtype[float64]]#

Returns a feature vector containing float values for the instance.

property y_value: float64#

Returns the target value of the instance.