MiscTable¶

class audformat.MiscTable(index, *, split_id=None, media_id=None, description=None, meta=None)[source]¶

Miscellaneous table.

Note

Intended for use with tables that have an index that is not conform to table specifications. Otherwise, use audformat.Table.

To fill a table with labels, add one or more audformat.Column and use audformat.MiscTable.set() to set the values. When adding a column, the column ID must be different from the index level names. When initialized with a single-level pandas.MultiIndex, the index will be converted to a pandas.Index.

Parameters

index (Index) – table index with non-empty and unique level names
split_id (Optional[str]) – split identifier (must exist)
media_id (Optional[str]) – media identifier (must exist)
description (Optional[str]) – database description
meta (Optional[dict]) – additional meta fields

Raises

ValueError – if level names of index are empty or not unique

Examples

>>> index = pd.MultiIndex.from_tuples(
...     [
...         ("f1", "f2"),
...         ("f1", "f3"),
...         ("f2", "f3"),
...     ],
...     names=["file", "other"],
... )
>>> index = utils.set_index_dtypes(index, "string")
>>> table = MiscTable(
...     index,
...     split_id=define.SplitType.TEST,
... )
>>> table["match"] = Column()
>>> table
levels: {file: str, other: str}
split_id: test
columns:
  match: {}
>>> table.get()
           match
file other
f1   f2      NaN
     f3      NaN
f2   f3      NaN
>>> table.set({"match": [True, False, True]})
>>> table.get()
           match
file other
f1   f2     True
     f3    False
f2   f3     True
>>> table.get(index[:2])
           match
file other
f1   f2     True
     f3    False
>>> index_new = pd.MultiIndex.from_tuples(
...     [
...         ("f4", "f1"),
...     ],
...     names=["file", "other"],
... )
>>> index_new = utils.set_index_dtypes(index_new, "string")
>>> table_ex = table.extend_index(
...     index_new,
...     inplace=False,
... )
>>> table_ex.get()
            match
file other
f1   f2      True
     f3     False
f2   f3      True
f4   f1       NaN
>>> table_ex.set(
...     {"match": True},
...     index=index_new,
... )
>>> table_ex.get()
            match
file other
f1   f2      True
     f3     False
f2   f3      True
f4   f1      True
>>> table_str = MiscTable(index)
>>> table_str["strings"] = Column()
>>> table_str.set({"strings": ["a", "b", "c"]})
>>> (table + table_str).get()
            match strings
file other
f1   f2      True       a
     f3     False       b
f2   f3      True       c
>>> (table_ex + table_str).get()
            match strings
file other
f1   f2      True       a
     f3     False       b
f2   f3      True       c
f4   f1      True     NaN

add()¶

MiscTable.__add__(other)¶

Create new table by combining two tables.

The new combined table contains index and columns of both tables. Missing values will be set to NaN.

If table is conform to table specifications and at least one table is segmented, the output has a segmented index.

Columns with the same identifier are combined to a single column. This requires that:

both columns have the same dtype
in places where the indices overlap the values of both columns match or one column contains NaN

Media and split information, as well as, references to schemes and raters are discarded. If you intend to keep them, use update().

Parameters

other – the other table

Raises

ValueError – if columns with the same name have different dtypes
ValueError – if values in the same position do not match
ValueError – if level and dtypes of indices do not match

eq()¶

MiscTable.__eq__(other)¶

Compare if table equals other table.

Return type: bool

getitem()¶

MiscTable.__getitem__(column_id)¶

Return view to a column.

Parameters: column_id (str) – column identifier
Return type: Column

len()¶

MiscTable.__len__()¶

Number of rows in table.

Return type: int

setitem()¶

MiscTable.__setitem__(column_id, column)¶

Add new column to table.

Parameters

column_id (str) – column identifier
column (Column) – column

Raises

BadIdError – if a column with a scheme_id or rater_id is added that does not exist
ValueError – if column ID is not different from level names
ValueError – if the column is linked to a scheme that is using labels from a misc table, but the misc table the column is assigned to is already used by the same or another scheme

Return type

Column

columns¶

MiscTable.columns¶: Table columns

copy()¶

MiscTable.copy()¶

Copy table.

Returns: new table object

db¶

MiscTable.db¶

Database object.

Returns: database object or None if not assigned yet

description¶

MiscTable.description¶: Description

df¶

MiscTable.df¶

Table data.

Returns: data

drop_columns()¶

MiscTable.drop_columns(column_ids, *, inplace=False)¶

Drop columns by ID.

Parameters

column_ids – column IDs
inplace – drop columns in place

Returns

new object if inplace=False, otherwise self

drop_index()¶

MiscTable.drop_index(index, *, inplace=False)¶

Drop rows from index.

Parameters

index – index object
inplace – drop index in place

Returns

new object if inplace=False, otherwise self

Raises

ValueError – if level and dtypes of index does not match table index

dump()¶

MiscTable.dump(stream=None, indent=2)¶

Serialize object to YAML.

Parameters

stream – file-like object. If None serializes to string
indent (int) – indent

Return type

str

Returns

YAML string

extend_index()¶

MiscTable.extend_index(index, *, fill_values=None, inplace=False)¶

Extend table with new rows.

Parameters

index – index object
fill_values – replace NaN with these values (either a scalar applied to all columns or a dictionary with column name as key)
inplace – extend index in place

Returns

new object if inplace=False, otherwise self

Raises

ValueError – if level and dtypes of index does not match table index

from_dict()¶

MiscTable.from_dict(d, ignore_keys=None)¶

Deserialize object from dictionary.

Parameters

d (dict) – dictionary of class variables to assign
ignore_keys (Optional[Sequence[str]]) – variables listed here will be ignored

get()¶

MiscTable.get(index=None, *, map=None, copy=True)¶

Get labels.

By default, all labels of the table are returned, use index to get a subset.

Examples are provided with the table specifications, and for map in Map scheme labels.

Parameters

index (Optional[Index]) – index
copy (bool) – return a copy of the labels
map (Optional[dict[str, str | Sequence[str]]]) – map scheme or scheme fields to column values. For example if your table holds a column speaker with speaker IDs, which is assigned to a scheme that contains a dict mapping speaker IDs to age and gender entries, map={'speaker': ['age', 'gender']} will replace the column with two new columns that map ID values to age and gender, respectively. To also keep the original column with speaker IDS, you can do map={'speaker': ['speaker', 'age', 'gender']}

Return type

DataFrame

Returns

labels

Raises

FileNotFoundError – if file is not found
RuntimeError – if table is not assign to a database
ValueError – if trying to map without a scheme
ValueError – if trying to map from a scheme that has no labels
ValueError – if trying to map to a non-existing field

index¶

MiscTable.index¶

Table index.

Returns: index

levels¶

MiscTable.levels¶: Index levels.

load()¶

MiscTable.load(path)¶

Load table data from disk.

Tables are stored as CSV, PARQUET and/or PKL files to disk. If the PKL file exists, it will load the PKL file as long as its modification date is the newest, otherwise it will raise an error and ask to delete one of the files.

Parameters

path (str) – file path without extension

Raises

RuntimeError – if table file(s) are missing
RuntimeError – if CSV or PARQUET file is newer than PKL file

media¶

MiscTable.media¶

Media object.

Returns: media object or None if not available

media_id¶

MiscTable.media_id¶: Media ID

meta¶

MiscTable.meta¶: Dictionary with meta fields

pick_columns()¶

MiscTable.pick_columns(column_ids, *, inplace=False)¶

Pick columns by ID.

All other columns will be dropped.

Parameters

column_ids – column IDs
inplace – pick columns in place

Returns

new object if inplace=False, otherwise self

pick_index()¶

MiscTable.pick_index(index, *, inplace=False)¶

Pick rows from index.

Parameters

index – index object
inplace – pick index in place

Returns

new object if inplace=False, otherwise self

Raises

ValueError – if level and dtypes of index does not match table index

save()¶

MiscTable.save(path, *, storage_format='parquet', update_other_formats=True)¶

Save table data to disk.

Existing files will be overwritten.

When using "parquet" as storage_format a hash, based on the content of the table, is stored under the key b"hash" in the metadata of the schema of the parquet file. This provides a deterministic hash for the file, as md5 sums of parquet files, containing identical information, often differ. Reasons include factors like the library that wrote the parquet file, the chosen compression codec and metadata written by the library.

The hash can be accessed with pyarrow by:

pyarrow.parquet.read_schema(f"{path}.parquet").metadata[b"hash"].decode()

The hash is used by audb when publishing a database to track changes of database files.

Parameters

path (str) – file path without extension
storage_format (str) – storage format of table. See audformat.define.TableStorageFormat for available formats
update_other_formats (bool) – if True it will not only save to the given storage_format, but update all files stored in other storage formats as well

set()¶

MiscTable.set(values, *, index=None)¶

Set labels.

By default, all labels of the table are replaced, use index to select a subset. If a column is assigned to a Scheme values will be automatically converted to match its dtype.

Examples are provided with the table specifications.

Parameters

values (dict[str, Union[int, float, str, Timedelta, Sequence[Union[int, float, str, Timedelta]], ndarray, Series]] | DataFrame) – dictionary of values with column_id as key
index (Optional[Index]) – index

Raises

ValueError – if values cannot be converted to match the schemes dtype

split¶

MiscTable.split¶

Split object.

Returns: split object or None if not available

split_id¶

MiscTable.split_id¶: Split ID

to_dict()¶

MiscTable.to_dict()¶

Serialize object to dictionary.

Return type: dict
Returns: dictionary with attributes

update()¶

MiscTable.update(others, *, overwrite=False)¶

Update table with other table(s).

Table which calls update() to combine tables must be assigned to a database. For all tables media and split must match.

Columns that are not yet part of the table will be added and referenced schemes or raters are copied. For overlapping columns, schemes and raters must match.

Columns with the same identifier are combined to a single column. This requires that both columns have the same dtype and if overwrite is set to False, values in places where the indices overlap have to match or one column contains NaN. If overwrite is set to True, the value of the last table in the list is kept.

The index type of the table must not change.

Parameters

others – table object(s)
overwrite – overwrite values where indices overlap

Returns

the updated table

Raises

RuntimeError – if table is not assign to a database
ValueError – if split or media does not match
ValueError – if overlapping columns reference different schemes or raters
ValueError – if a missing scheme or rater cannot be copied because a different object with the same ID exists
ValueError – if values in same position overlap
ValueError – if level and dtypes of table indices do not match

MiscTable¶

__add__()¶

__eq__()¶

__getitem__()¶

__len__()¶

__setitem__()¶

columns¶

copy()¶

db¶

description¶

df¶

drop_columns()¶

drop_index()¶

dump()¶

extend_index()¶

from_dict()¶

get()¶

index¶

levels¶

load()¶

media¶

media_id¶

meta¶

pick_columns()¶

pick_index()¶

save()¶

set()¶

split¶

split_id¶

to_dict()¶

update()¶

add()¶

eq()¶

getitem()¶

len()¶

setitem()¶