Table

class audformat.Table(index=None, *, split_id=None, media_id=None, description=None, meta=None)[source]

Table conform to table specifications.

Consists of a list of file names to which it assigns numerical values or labels. To fill a table with labels, add one or more audformat.Column and use audformat.Table.set() to set the values. When adding a column, the column ID must be different from the index level names, which are 'file' in case of a filewise table and 'file', 'start' and 'end' in case of segmented table.

Parameters
Raises

ValueError – if index not conform to table specifications

Examples

>>> index = filewise_index(["f1", "f2", "f3"])
>>> table = Table(
...     index,
...     split_id=define.SplitType.TEST,
... )
>>> table["values"] = Column()
>>> table
type: filewise
split_id: test
columns:
  values: {}
>>> table.get()
     values
file
f1      NaN
f2      NaN
f3      NaN
>>> table.set({"values": [0, 1, 2]})
>>> table.get()
     values
file
f1        0
f2        1
f3        2
>>> table.get(index[:2])
     values
file
f1        0
f2        1
>>> table.get(as_segmented=True)
                values
file start  end
f1   0 days NaT      0
f2   0 days NaT      1
f3   0 days NaT      2
>>> index_new = filewise_index("f4")
>>> table_ex = table.extend_index(
...     index_new,
...     inplace=False,
... )
>>> table_ex.get()
     values
file
f1        0
f2        1
f3        2
f4      NaN
>>> table_ex.set(
...     {"values": 3},
...     index=index_new,
... )
>>> table_ex.get()
     values
file
f1        0
f2        1
f3        2
f4        3
>>> table_str = Table(index)
>>> table_str["strings"] = Column()
>>> table_str.set({"strings": ["a", "b", "c"]})
>>> (table + table_str).get()
     values strings
file
f1        0       a
f2        1       b
f3        2       c
>>> (table_ex + table_str).get()
     values strings
file
f1        0       a
f2        1       b
f3        2       c
f4        3     NaN

__add__()

Table.__add__(other)

Create new table by combining two tables.

The new combined table contains index and columns of both tables. Missing values will be set to NaN.

If table is conform to table specifications and at least one table is segmented, the output has a segmented index.

Columns with the same identifier are combined to a single column. This requires that:

  1. both columns have the same dtype

  2. in places where the indices overlap the values of both columns match or one column contains NaN

Media and split information, as well as, references to schemes and raters are discarded. If you intend to keep them, use update().

Parameters

other – the other table

Raises
  • ValueError – if columns with the same name have different dtypes

  • ValueError – if values in the same position do not match

  • ValueError – if level and dtypes of indices do not match

__eq__()

Table.__eq__(other)

Compare if table equals other table.

Return type

bool

__getitem__()

Table.__getitem__(column_id)

Return view to a column.

Parameters

column_id (str) – column identifier

Return type

Column

__len__()

Table.__len__()

Number of rows in table.

Return type

int

__setitem__()

Table.__setitem__(column_id, column)

Add new column to table.

Parameters
  • column_id (str) – column identifier

  • column (Column) – column

Raises
  • BadIdError – if a column with a scheme_id or rater_id is added that does not exist

  • ValueError – if column ID is not different from level names

  • ValueError – if the column is linked to a scheme that is using labels from a misc table, but the misc table the column is assigned to is already used by the same or another scheme

Return type

Column

columns

Table.columns

Table columns

copy()

Table.copy()

Copy table.

Returns

new table object

db

Table.db

Database object.

Returns

database object or None if not assigned yet

description

Table.description

Description

df

Table.df

Table data.

Returns

data

drop_columns()

Table.drop_columns(column_ids, *, inplace=False)

Drop columns by ID.

Parameters
  • column_ids – column IDs

  • inplace – drop columns in place

Returns

new object if inplace=False, otherwise self

drop_files()

Table.drop_files(files, *, inplace=False)[source]

Drop files.

Remove rows with a reference to listed or matching files.

Parameters
Return type

Table

Returns

new object if inplace=False, otherwise self

drop_index()

Table.drop_index(index, *, inplace=False)

Drop rows from index.

Parameters
  • index – index object

  • inplace – drop index in place

Returns

new object if inplace=False, otherwise self

Raises

ValueError – if level and dtypes of index does not match table index

dump()

Table.dump(stream=None, indent=2)

Serialize object to YAML.

Parameters
  • stream – file-like object. If None serializes to string

  • indent (int) – indent

Return type

str

Returns

YAML string

ends

Table.ends

Segment end times.

Returns

timestamps

extend_index()

Table.extend_index(index, *, fill_values=None, inplace=False)

Extend table with new rows.

Parameters
  • index – index object

  • fill_values – replace NaN with these values (either a scalar applied to all columns or a dictionary with column name as key)

  • inplace – extend index in place

Returns

new object if inplace=False, otherwise self

Raises

ValueError – if level and dtypes of index does not match table index

files

Table.files

Files referenced in the table.

Returns

files

from_dict()

Table.from_dict(d, ignore_keys=None)

Deserialize object from dictionary.

Parameters
  • d (dict) – dictionary of class variables to assign

  • ignore_keys (Optional[Sequence[str]]) – variables listed here will be ignored

get()

Table.get(index=None, *, map=None, copy=True, as_segmented=False, allow_nat=True, root=None, num_workers=1, verbose=False)[source]

Get labels.

By default, all labels of the table are returned, use index to get a subset.

Examples are provided with the table specifications.

Parameters
  • index (Optional[Index]) – index conform to table specifications

  • copy (bool) – return a copy of the labels

  • map (Optional[Dict[str, Union[str, Sequence[str]]]]) – map scheme or scheme fields to column values. For example if your table holds a column speaker with speaker IDs, which is assigned to a scheme that contains a dict mapping speaker IDs to age and gender entries, map={'speaker': ['age', 'gender']} will replace the column with two new columns that map ID values to age and gender, respectively. To also keep the original column with speaker IDS, you can do map={'speaker': ['speaker', 'age', 'gender']}

  • as_segmented (bool) – if set to True and table has a filewise index, the index of the returned table will be converted to a segmented index. start will be set to 0 and end to NaT or to the file duration if allow_nat is set to False

  • allow_nat (bool) – if set to False, end=NaT is replaced with file duration

  • root (Optional[str]) – root directory under which the files are stored. Provide if file names are relative and database was not saved or loaded from disk. If None audformat.Database.root is used. Only relevant if allow_nat is set to False

  • num_workers (Optional[int]) – number of parallel jobs. If None will be set to the number of processors on the machine multiplied by 5

  • verbose (bool) – show progress bar

Return type

DataFrame

Returns

labels

Raises

index

Table.index

Table index.

Returns

index

is_filewise

Table.is_filewise

Check if filewise table.

Returns

True if filewise table.

is_segmented

Table.is_segmented

Check if segmented table.

Returns

True if segmented table.

load()

Table.load(path)

Load table data from disk.

Tables are stored as CSV, PARQUET and/or PKL files to disk. If the PKL file exists, it will load the PKL file as long as its modification date is the newest, otherwise it will raise an error and ask to delete one of the files.

Parameters

path (str) – file path without extension

Raises

map_files()

Table.map_files(func)[source]

Apply function to file names in table.

If speed is crucial, see audformat.utils.map_file_path() for further hints how to optimize your code.

Parameters

func (Callable[[str], str]) – map function

media

Table.media

Media object.

Returns

media object or None if not available

media_id

Table.media_id

Media ID

meta

Table.meta

Dictionary with meta fields

pick_columns()

Table.pick_columns(column_ids, *, inplace=False)

Pick columns by ID.

All other columns will be dropped.

Parameters
  • column_ids – column IDs

  • inplace – pick columns in place

Returns

new object if inplace=False, otherwise self

pick_files()

Table.pick_files(files, *, inplace=False)[source]

Pick files.

Keep only rows with a reference to listed files or matching files.

Parameters
Return type

Table

Returns

new object if inplace=False, otherwise self

pick_index()

Table.pick_index(index, *, inplace=False)

Pick rows from index.

Parameters
  • index – index object

  • inplace – pick index in place

Returns

new object if inplace=False, otherwise self

Raises

ValueError – if level and dtypes of index does not match table index

save()

Table.save(path, *, storage_format='parquet', update_other_formats=True)

Save table data to disk.

Existing files will be overwritten.

When using "parquet" as storage_format a hash, based on the content of the table, is stored under the key b"hash" in the metadata of the schema of the parquet file. This provides a deterministic hash for the file, as md5 sums of parquet files, containing identical information, often differ. Reasons include factors like the library that wrote the parquet file, the chosen compression codec and metadata written by the library.

The hash can be accessed with pyarrow by:

pyarrow.parquet.read_schema(f"{path}.parquet").metadata[b"hash"].decode()

The hash is used by audb when publishing a database to track changes of database files.

Parameters
  • path (str) – file path without extension

  • storage_format (str) – storage format of table. See audformat.define.TableStorageFormat for available formats

  • update_other_formats (bool) – if True it will not only save to the given storage_format, but update all files stored in other storage formats as well

set()

Table.set(values, *, index=None)

Set labels.

By default, all labels of the table are replaced, use index to select a subset. If a column is assigned to a Scheme values will be automatically converted to match its dtype.

Examples are provided with the table specifications.

Parameters
Raises

ValueError – if values cannot be converted to match the schemes dtype

split

Table.split

Split object.

Returns

split object or None if not available

split_id

Table.split_id

Split ID

starts

Table.starts

Segment start times.

Returns

timestamps

to_dict()

Table.to_dict()

Serialize object to dictionary.

Return type

dict

Returns

dictionary with attributes

type

Table.type

Table type

See audformat.define.IndexType for possible values.

update()

Table.update(others, *, overwrite=False)

Update table with other table(s).

Table which calls update() to combine tables must be assigned to a database. For all tables media and split must match.

Columns that are not yet part of the table will be added and referenced schemes or raters are copied. For overlapping columns, schemes and raters must match.

Columns with the same identifier are combined to a single column. This requires that both columns have the same dtype and if overwrite is set to False, values in places where the indices overlap have to match or one column contains NaN. If overwrite is set to True, the value of the last table in the list is kept.

The index type of the table must not change.

Parameters
  • others – table object(s)

  • overwrite – overwrite values where indices overlap

Returns

the updated table

Raises
  • RuntimeError – if table is not assign to a database

  • ValueError – if split or media does not match

  • ValueError – if overlapping columns reference different schemes or raters

  • ValueError – if a missing scheme or rater cannot be copied because a different object with the same ID exists

  • ValueError – if values in same position overlap

  • ValueError – if level and dtypes of table indices do not match