Scheme

class audformat.Scheme(dtype=None, *, labels=None, minimum=None, maximum=None, description=None, meta=None)[source]

A scheme defines valid values of an annotation.

Allowed values for dtype are: 'bool', 'int', 'float', 'object', 'str', 'time', and 'date' (see audformat.define.DataType). Values can be restricted to a set of labels provided by a list, dictionary or a table ID of a audformat.MiscTable, where the values of the index are used as labels. A continuous range can be limited by a minimum and maximum value.

Parameters
Raises
  • BadValueError – if an invalid dtype is passed

  • ValueError – if labels are not passed as string, list, or dictionary

  • ValueError – if labels is a table ID, but dtype is not specified

  • ValueError – if labels are not of same data type

  • ValueErrordtype does not match type of labels if labels is a list or dictionary

  • ValueError – when assigning a scheme, that contains a table ID as labels, to a database, but the corresponding misc table is not part of the database, or the given table ID is not a misc table, or its index is multi-dimensional, or its index contains duplicates, or dtype does not match type of labels from misc table, or dtype is set to bool, or the misc table has a column that is already assigned to a scheme with labels from another misc table

Examples

>>> Scheme()
{dtype: str}
>>> Scheme(labels=["a", "b", "c"])
dtype: str
labels: [a, b, c]
>>> Scheme(define.DataType.INTEGER)
{dtype: int}
>>> Scheme("float", minimum=0, maximum=1)
{dtype: float, minimum: 0, maximum: 1}
>>> # Use index of misc table as labels
>>> import audformat
>>> db = audformat.Database("mydb")
>>> db["speaker"] = audformat.MiscTable(pd.Index(["spk1", "spk2"], name="speaker"))
>>> Scheme("str", labels="speaker")
{dtype: str, labels: speaker}

__contains__()

Scheme.__contains__(item)[source]

Check if scheme contains data type of item.

None, NaT and NaN always match

Return type

bool

Returns

True if item is covered by scheme

__eq__()

Scheme.__eq__(other)

Return self==value.

Return type

bool

description

Scheme.description

Description

draw()

Scheme.draw(n, *, str_len=10, p_none=None)[source]

Randomly draws values from scheme.

Parameters
  • n (int) – number of values

  • str_len (int) – string length if drawing from a string scheme without labels

  • p_none (Optional[bool]) – probability for drawing an invalid value

Return type

list

Returns

list with values

dtype

Scheme.dtype

Data type.

Possible return values are given by audformat.define.DataType.

dump()

Scheme.dump(stream=None, indent=2)

Serialize object to YAML.

Parameters
  • stream – file-like object. If None serializes to string

  • indent (int) – indent

Return type

str

Returns

YAML string

from_dict()

Scheme.from_dict(d, ignore_keys=None)

Deserialize object from dictionary.

Parameters
  • d (dict) – dictionary of class variables to assign

  • ignore_keys (Optional[Sequence[str]]) – variables listed here will be ignored

is_numeric

Scheme.is_numeric

Data type is numeric.

Returns

True if data type is numeric

labels

Scheme.labels

Labels or ID of misc table holding the labels

labels_as_list

Scheme.labels_as_list

Scheme labels as list.

If scheme does not define labels an empty list is returned.

Returns

list of labels

maximum

Scheme.maximum

Maximum value

meta

Scheme.meta

Dictionary with meta fields

minimum

Scheme.minimum

Minimum value

replace_labels()

Scheme.replace_labels(labels)[source]

Replace labels.

If scheme is part of a audformat.Database the dtype of all audformat.Column objects that reference the scheme will be updated. Removed labels are set to NaN.

Parameters

labels (Union[dict, list, str]) – new labels

Raises
  • ValueError – if scheme does not define labels

  • ValueError – if dtype of new labels does not match dtype of scheme

  • ValueError – if labels is a misc table ID and the scheme is already assigned to a database, but the corresponding misc table is not part of the database, or the given table ID is not a misc table, or its index is multi-dimensional, or its index contains duplicates, or the misc table has a column that is already assigned to a scheme with labels from another misc table

Examples

>>> speaker = Scheme(
...     labels={
...         0: {"gender": "female"},
...         1: {"gender": "male"},
...     }
... )
>>> speaker
dtype: int
labels:
  0: {gender: female}
  1: {gender: male}
>>> speaker.replace_labels(
...     {
...         1: {"gender": "male", "age": 33},
...         2: {"gender": "female", "age": 44},
...     }
... )
>>> speaker
dtype: int
labels:
  1: {gender: male, age: 33}
  2: {gender: female, age: 44}

to_dict()

Scheme.to_dict()

Serialize object to dictionary.

Return type

dict

Returns

dictionary with attributes

to_pandas_dtype()

Scheme.to_pandas_dtype()[source]

Convert data type to pandas data type.

If labels is not None, pandas.CategoricalDtype is returned. Otherwise the following rules are applied:

  • str -> str

  • int -> Int64 (to allow NaN)

  • float -> float

  • time -> timedelta64[ns]

  • date -> datetime64[ns]

Return type

Union[str, CategoricalDtype]

Returns

pandas data type

uses_table

Scheme.uses_table

Scheme has labels stored in a misc table.

If property is True the attribute labels is set to an ID of a audformat.MiscTable where the actual label values are stored.

Returns

True if scheme has labels stored in a misc table