Scheme

class audformat.Scheme(dtype=None, *, labels=None, minimum=None, maximum=None, description=None, meta=None)[source]

A scheme defines valid values of an annotation.

Allowed values for dtype are: 'bool', 'int', 'float', 'object', 'str', 'time', and 'date' (see audformat.define.DataType). Values can be restricted to a set of labels provided by a list, dictionary or a table ID of a audformat.MiscTable, where the values of the index are used as labels. A continuous range can be limited by a minimum and maximum value.

Parameters:
  • dtype (str) – if None derived from labels, otherwise set to 'str'

  • labels (dict | list | str) – list, dictionary or table ID of a corresponding audformat.MiscTable containing labels as index. If a table ID is provided, dtype has to be specified

  • minimum (int | float) – minimum value

  • maximum (int | float) – maximum value

  • description (str) – scheme description

  • meta (dict) – additional meta fields

Raises:
  • BadValueError – if an invalid dtype is passed

  • ValueError – if labels are not passed as string, list, or dictionary

  • ValueError – if labels is a table ID, but dtype is not specified

  • ValueError – if labels are not of same data type

  • ValueErrordtype does not match type of labels if labels is a list or dictionary

  • ValueError – when assigning a scheme, that contains a table ID as labels, to a database, but the corresponding misc table is not part of the database, or the given table ID is not a misc table, or its index is multi-dimensional, or its index contains duplicates, or dtype does not match type of labels from misc table, or dtype is set to bool, or the misc table has a column that is already assigned to a scheme with labels from another misc table

Examples

>>> Scheme()
{dtype: str}
>>> Scheme(labels=["a", "b", "c"])
dtype: str
labels: [a, b, c]
>>> Scheme(define.DataType.INTEGER)
{dtype: int}
>>> Scheme("float", minimum=0, maximum=1)
{dtype: float, minimum: 0, maximum: 1}
>>> # Use index of misc table as labels
>>> import audformat
>>> db = audformat.Database("mydb")
>>> db["speaker"] = audformat.MiscTable(
...     pd.Index(["spk1", "spk2"], name="speaker")
... )
>>> Scheme("str", labels="speaker")
{dtype: str, labels: speaker}

__contains__()

Scheme.__contains__(item)[source]

Check if scheme contains data type of item.

None, NaT and NaN always match

Return type:

bool

Returns:

True if item is covered by scheme

__eq__()

Scheme.__eq__(other)

Return self==value.

Return type:

bool

description

Scheme.description

Description

draw()

Scheme.draw(n, *, str_len=10, p_none=None)[source]

Randomly draws values from scheme.

Parameters:
  • n (int) – number of values

  • str_len (int) – string length if drawing from a string scheme without labels

  • p_none (bool) – probability for drawing an invalid value

Return type:

list

Returns:

list with values

dtype

Scheme.dtype

Data type.

Possible return values are given by audformat.define.DataType.

dump()

Scheme.dump(stream=None, indent=2)

Serialize object to YAML.

Parameters:
  • stream – file-like object. If None serializes to string

  • indent (int) – indent

Return type:

str

Returns:

YAML string

from_dict()

Scheme.from_dict(d, ignore_keys=None)

Deserialize object from dictionary.

Parameters:
  • d (dict) – dictionary of class variables to assign

  • ignore_keys (Sequence[str]) – variables listed here will be ignored

is_numeric

Scheme.is_numeric

Data type is numeric.

Returns:

True if data type is numeric

labels

Scheme.labels

Labels or ID of misc table holding the labels

labels_as_list

Scheme.labels_as_list

Scheme labels as list.

If scheme does not define labels an empty list is returned.

Returns:

list of labels

maximum

Scheme.maximum

Maximum value

meta

Scheme.meta

Dictionary with meta fields

minimum

Scheme.minimum

Minimum value

replace_labels()

Scheme.replace_labels(labels)[source]

Replace labels.

If scheme is part of a audformat.Database the dtype of all audformat.Column objects that reference the scheme will be updated. Removed labels are set to NaN.

Parameters:

labels (dict | list | str) – new labels

Raises:
  • ValueError – if scheme does not define labels

  • ValueError – if dtype of new labels does not match dtype of scheme

  • ValueError – if labels is a misc table ID and the scheme is already assigned to a database, but the corresponding misc table is not part of the database, or the given table ID is not a misc table, or its index is multi-dimensional, or its index contains duplicates, or the misc table has a column that is already assigned to a scheme with labels from another misc table

Examples

>>> speaker = Scheme(
...     labels={
...         0: {"gender": "female"},
...         1: {"gender": "male"},
...     }
... )
>>> speaker
dtype: int
labels:
  0: {gender: female}
  1: {gender: male}
>>> speaker.replace_labels(
...     {
...         1: {"gender": "male", "age": 33},
...         2: {"gender": "female", "age": 44},
...     }
... )
>>> speaker
dtype: int
labels:
  1: {gender: male, age: 33}
  2: {gender: female, age: 44}

to_dict()

Scheme.to_dict()

Serialize object to dictionary.

Return type:

dict

Returns:

dictionary with attributes

to_pandas_dtype()

Scheme.to_pandas_dtype()[source]

Convert data type to pandas data type.

If labels is not None, pandas.CategoricalDtype is returned. Otherwise the following rules are applied:

  • str -> str

  • int -> Int64 (to allow NaN)

  • float -> float

  • time -> timedelta64[ns]

  • date -> datetime64[ns]

Return type:

str | CategoricalDtype

Returns:

pandas data type

uses_table

Scheme.uses_table

Scheme has labels stored in a misc table.

If property is True the attribute labels is set to an ID of a audformat.MiscTable where the actual label values are stored.

Returns:

True if scheme has labels stored in a misc table