MiscTable

class audformat.MiscTable(index, *, split_id=None, media_id=None, description=None, meta=None)[source]

Miscellaneous table.

Note

Intended for use with tables that have an index that is not conform to table specifications. Otherwise, use audformat.Table.

To fill a table with labels, add one or more audformat.Column and use audformat.MiscTable.set() to set the values. When adding a column, the column ID must be different from the index level names. When initialized with a single-level pandas.MultiIndex, the index will be converted to a pandas.Index.

Parameters
  • index (Index) – table index with non-empty and unique level names

  • split_id (Optional[str]) – split identifier (must exist)

  • media_id (Optional[str]) – media identifier (must exist)

  • description (Optional[str]) – database description

  • meta (Optional[dict]) – additional meta fields

Raises

ValueError – if level names of index are empty or not unique

Examples

>>> index = pd.MultiIndex.from_tuples(
...     [
...         ("f1", "f2"),
...         ("f1", "f3"),
...         ("f2", "f3"),
...     ],
...     names=["file", "other"],
... )
>>> index = utils.set_index_dtypes(index, "string")
>>> table = MiscTable(
...     index,
...     split_id=define.SplitType.TEST,
... )
>>> table["match"] = Column()
>>> table
levels: {file: str, other: str}
split_id: test
columns:
  match: {}
>>> table.get()
           match
file other
f1   f2      NaN
     f3      NaN
f2   f3      NaN
>>> table.set({"match": [True, False, True]})
>>> table.get()
           match
file other
f1   f2     True
     f3    False
f2   f3     True
>>> table.get(index[:2])
           match
file other
f1   f2     True
     f3    False
>>> index_new = pd.MultiIndex.from_tuples(
...     [
...         ("f4", "f1"),
...     ],
...     names=["file", "other"],
... )
>>> index_new = utils.set_index_dtypes(index_new, "string")
>>> table_ex = table.extend_index(
...     index_new,
...     inplace=False,
... )
>>> table_ex.get()
            match
file other
f1   f2      True
     f3     False
f2   f3      True
f4   f1       NaN
>>> table_ex.set(
...     {"match": True},
...     index=index_new,
... )
>>> table_ex.get()
            match
file other
f1   f2      True
     f3     False
f2   f3      True
f4   f1      True
>>> table_str = MiscTable(index)
>>> table_str["strings"] = Column()
>>> table_str.set({"strings": ["a", "b", "c"]})
>>> (table + table_str).get()
            match strings
file other
f1   f2      True       a
     f3     False       b
f2   f3      True       c
>>> (table_ex + table_str).get()
            match strings
file other
f1   f2      True       a
     f3     False       b
f2   f3      True       c
f4   f1      True     NaN

__add__()

MiscTable.__add__(other)

Create new table by combining two tables.

The new combined table contains index and columns of both tables. Missing values will be set to NaN.

If table is conform to table specifications and at least one table is segmented, the output has a segmented index.

Columns with the same identifier are combined to a single column. This requires that:

  1. both columns have the same dtype

  2. in places where the indices overlap the values of both columns match or one column contains NaN

Media and split information, as well as, references to schemes and raters are discarded. If you intend to keep them, use update().

Parameters

other – the other table

Raises
  • ValueError – if columns with the same name have different dtypes

  • ValueError – if values in the same position do not match

  • ValueError – if level and dtypes of indices do not match

__eq__()

MiscTable.__eq__(other)

Compare if table equals other table.

Return type

bool

__getitem__()

MiscTable.__getitem__(column_id)

Return view to a column.

Parameters

column_id (str) – column identifier

Return type

Column

__len__()

MiscTable.__len__()

Number of rows in table.

Return type

int

__setitem__()

MiscTable.__setitem__(column_id, column)

Add new column to table.

Parameters
  • column_id (str) – column identifier

  • column (Column) – column

Raises
  • BadIdError – if a column with a scheme_id or rater_id is added that does not exist

  • ValueError – if column ID is not different from level names

  • ValueError – if the column is linked to a scheme that is using labels from a misc table, but the misc table the column is assigned to is already used by the same or another scheme

Return type

Column

columns

MiscTable.columns

Table columns

copy()

MiscTable.copy()

Copy table.

Returns

new table object

db

MiscTable.db

Database object.

Returns

database object or None if not assigned yet

description

MiscTable.description

Description

df

MiscTable.df

Table data.

Returns

data

drop_columns()

MiscTable.drop_columns(column_ids, *, inplace=False)

Drop columns by ID.

Parameters
  • column_ids – column IDs

  • inplace – drop columns in place

Returns

new object if inplace=False, otherwise self

drop_index()

MiscTable.drop_index(index, *, inplace=False)

Drop rows from index.

Parameters
  • index – index object

  • inplace – drop index in place

Returns

new object if inplace=False, otherwise self

Raises

ValueError – if level and dtypes of index does not match table index

dump()

MiscTable.dump(stream=None, indent=2)

Serialize object to YAML.

Parameters
  • stream – file-like object. If None serializes to string

  • indent (int) – indent

Return type

str

Returns

YAML string

extend_index()

MiscTable.extend_index(index, *, fill_values=None, inplace=False)

Extend table with new rows.

Parameters
  • index – index object

  • fill_values – replace NaN with these values (either a scalar applied to all columns or a dictionary with column name as key)

  • inplace – extend index in place

Returns

new object if inplace=False, otherwise self

Raises

ValueError – if level and dtypes of index does not match table index

from_dict()

MiscTable.from_dict(d, ignore_keys=None)

Deserialize object from dictionary.

Parameters
  • d (dict) – dictionary of class variables to assign

  • ignore_keys (Optional[Sequence[str]]) – variables listed here will be ignored

get()

MiscTable.get(index=None, *, map=None, copy=True)

Get labels.

By default, all labels of the table are returned, use index to get a subset.

Examples are provided with the table specifications, and for map in Map scheme labels.

Parameters
  • index (Optional[Index]) – index

  • copy (bool) – return a copy of the labels

  • map (Optional[Dict[str, Union[str, Sequence[str]]]]) – map scheme or scheme fields to column values. For example if your table holds a column speaker with speaker IDs, which is assigned to a scheme that contains a dict mapping speaker IDs to age and gender entries, map={'speaker': ['age', 'gender']} will replace the column with two new columns that map ID values to age and gender, respectively. To also keep the original column with speaker IDS, you can do map={'speaker': ['speaker', 'age', 'gender']}

Return type

DataFrame

Returns

labels

Raises

index

MiscTable.index

Table index.

Returns

index

levels

MiscTable.levels

Index levels.

load()

MiscTable.load(path)

Load table data from disk.

Tables are stored as CSV, PARQUET and/or PKL files to disk. If the PKL file exists, it will load the PKL file as long as its modification date is the newest, otherwise it will raise an error and ask to delete one of the files.

Parameters

path (str) – file path without extension

Raises

media

MiscTable.media

Media object.

Returns

media object or None if not available

media_id

MiscTable.media_id

Media ID

meta

MiscTable.meta

Dictionary with meta fields

pick_columns()

MiscTable.pick_columns(column_ids, *, inplace=False)

Pick columns by ID.

All other columns will be dropped.

Parameters
  • column_ids – column IDs

  • inplace – pick columns in place

Returns

new object if inplace=False, otherwise self

pick_index()

MiscTable.pick_index(index, *, inplace=False)

Pick rows from index.

Parameters
  • index – index object

  • inplace – pick index in place

Returns

new object if inplace=False, otherwise self

Raises

ValueError – if level and dtypes of index does not match table index

save()

MiscTable.save(path, *, storage_format='parquet', update_other_formats=True)

Save table data to disk.

Existing files will be overwritten.

When using "parquet" as storage_format a hash, based on the content of the table, is stored under the key b"hash" in the metadata of the schema of the parquet file. This provides a deterministic hash for the file, as md5 sums of parquet files, containing identical information, often differ. Reasons include factors like the library that wrote the parquet file, the chosen compression codec and metadata written by the library.

The hash can be accessed with pyarrow by:

pyarrow.parquet.read_schema(f"{path}.parquet").metadata[b"hash"].decode()

The hash is used by audb when publishing a database to track changes of database files.

Parameters
  • path (str) – file path without extension

  • storage_format (str) – storage format of table. See audformat.define.TableStorageFormat for available formats

  • update_other_formats (bool) – if True it will not only save to the given storage_format, but update all files stored in other storage formats as well

set()

MiscTable.set(values, *, index=None)

Set labels.

By default, all labels of the table are replaced, use index to select a subset. If a column is assigned to a Scheme values will be automatically converted to match its dtype.

Examples are provided with the table specifications.

Parameters
Raises

ValueError – if values cannot be converted to match the schemes dtype

split

MiscTable.split

Split object.

Returns

split object or None if not available

split_id

MiscTable.split_id

Split ID

to_dict()

MiscTable.to_dict()

Serialize object to dictionary.

Return type

dict

Returns

dictionary with attributes

update()

MiscTable.update(others, *, overwrite=False)

Update table with other table(s).

Table which calls update() to combine tables must be assigned to a database. For all tables media and split must match.

Columns that are not yet part of the table will be added and referenced schemes or raters are copied. For overlapping columns, schemes and raters must match.

Columns with the same identifier are combined to a single column. This requires that both columns have the same dtype and if overwrite is set to False, values in places where the indices overlap have to match or one column contains NaN. If overwrite is set to True, the value of the last table in the list is kept.

The index type of the table must not change.

Parameters
  • others – table object(s)

  • overwrite – overwrite values where indices overlap

Returns

the updated table

Raises
  • RuntimeError – if table is not assign to a database

  • ValueError – if split or media does not match

  • ValueError – if overlapping columns reference different schemes or raters

  • ValueError – if a missing scheme or rater cannot be copied because a different object with the same ID exists

  • ValueError – if values in same position overlap

  • ValueError – if level and dtypes of table indices do not match