Table¶
- class audformat.Table(index=None, *, split_id=None, media_id=None, description=None, meta=None)[source]¶
Table conform to table specifications.
Consists of a list of file names to which it assigns numerical values or labels. To fill a table with labels, add one or more
audformat.Column
and useaudformat.Table.set()
to set the values. When adding a column, the column ID must be different from the index level names, which are'file'
in case of afilewise
table and'file'
,'start'
and'end'
in case ofsegmented
table.- Parameters
index (
Optional
[Index
]) – index conform to table specifications. IfNone
creates an empty filewise table
- Raises
ValueError – if index not conform to table specifications
Examples
>>> index = filewise_index(["f1", "f2", "f3"]) >>> table = Table( ... index, ... split_id=define.SplitType.TEST, ... ) >>> table["values"] = Column() >>> table type: filewise split_id: test columns: values: {} >>> table.get() values file f1 NaN f2 NaN f3 NaN >>> table.set({"values": [0, 1, 2]}) >>> table.get() values file f1 0 f2 1 f3 2 >>> table.get(index[:2]) values file f1 0 f2 1 >>> table.get(as_segmented=True) values file start end f1 0 days NaT 0 f2 0 days NaT 1 f3 0 days NaT 2 >>> index_new = filewise_index("f4") >>> table_ex = table.extend_index( ... index_new, ... inplace=False, ... ) >>> table_ex.get() values file f1 0 f2 1 f3 2 f4 NaN >>> table_ex.set( ... {"values": 3}, ... index=index_new, ... ) >>> table_ex.get() values file f1 0 f2 1 f3 2 f4 3 >>> table_str = Table(index) >>> table_str["strings"] = Column() >>> table_str.set({"strings": ["a", "b", "c"]}) >>> (table + table_str).get() values strings file f1 0 a f2 1 b f3 2 c >>> (table_ex + table_str).get() values strings file f1 0 a f2 1 b f3 2 c f4 3 NaN
__add__()¶
- Table.__add__(other)¶
Create new table by combining two tables.
The new combined table contains index and columns of both tables. Missing values will be set to
NaN
.If table is conform to table specifications and at least one table is segmented, the output has a segmented index.
Columns with the same identifier are combined to a single column. This requires that:
both columns have the same dtype
in places where the indices overlap the values of both columns match or one column contains
NaN
Media and split information, as well as, references to schemes and raters are discarded. If you intend to keep them, use
update()
.- Parameters
other – the other table
- Raises
ValueError – if columns with the same name have different dtypes
ValueError – if values in the same position do not match
ValueError – if level and dtypes of indices do not match
__getitem__()¶
__setitem__()¶
- Table.__setitem__(column_id, column)¶
Add new column to table.
- Parameters
- Raises
BadIdError – if a column with a
scheme_id
orrater_id
is added that does not existValueError – if column ID is not different from level names
ValueError – if the column is linked to a scheme that is using labels from a misc table, but the misc table the column is assigned to is already used by the same or another scheme
- Return type
drop_columns()¶
- Table.drop_columns(column_ids, *, inplace=False)¶
Drop columns by ID.
- Parameters
column_ids – column IDs
inplace – drop columns in place
- Returns
new object if
inplace=False
, otherwiseself
drop_files()¶
drop_index()¶
- Table.drop_index(index, *, inplace=False)¶
Drop rows from index.
- Parameters
index – index object
inplace – drop index in place
- Returns
new object if
inplace=False
, otherwiseself
- Raises
ValueError – if level and dtypes of index does not match table index
dump()¶
extend_index()¶
- Table.extend_index(index, *, fill_values=None, inplace=False)¶
Extend table with new rows.
- Parameters
index – index object
fill_values – replace NaN with these values (either a scalar applied to all columns or a dictionary with column name as key)
inplace – extend index in place
- Returns
new object if
inplace=False
, otherwiseself
- Raises
ValueError – if level and dtypes of index does not match table index
from_dict()¶
get()¶
- Table.get(index=None, *, map=None, copy=True, as_segmented=False, allow_nat=True, root=None, num_workers=1, verbose=False)[source]¶
Get labels.
By default, all labels of the table are returned, use
index
to get a subset.Examples are provided with the table specifications.
- Parameters
index (
Optional
[Index
]) – index conform to table specificationscopy (
bool
) – return a copy of the labelsmap (
Optional
[Dict
[str
,Union
[str
,Sequence
[str
]]]]) – map scheme or scheme fields to column values. For example if your table holds a columnspeaker
with speaker IDs, which is assigned to a scheme that contains a dict mapping speaker IDs to age and gender entries,map={'speaker': ['age', 'gender']}
will replace the column with two new columns that map ID values to age and gender, respectively. To also keep the original column with speaker IDS, you can domap={'speaker': ['speaker', 'age', 'gender']}
as_segmented (
bool
) – if set toTrue
and table has a filewise index, the index of the returned table will be converted to a segmented index.start
will be set to0
andend
toNaT
or to the file duration ifallow_nat
is set toFalse
allow_nat (
bool
) – if set toFalse
,end=NaT
is replaced with file durationroot (
Optional
[str
]) – root directory under which the files are stored. Provide if file names are relative and database was not saved or loaded from disk. IfNone
audformat.Database.root
is used. Only relevant ifallow_nat
is set toFalse
num_workers (
Optional
[int
]) – number of parallel jobs. IfNone
will be set to the number of processors on the machine multiplied by 5verbose (
bool
) – show progress bar
- Return type
- Returns
labels
- Raises
FileNotFoundError – if file is not found
RuntimeError – if table is not assign to a database
ValueError – if trying to map without a scheme
ValueError – if trying to map from a scheme that has no labels
ValueError – if trying to map to a non-existing field
load()¶
- Table.load(path)¶
Load table data from disk.
Tables are stored as CSV, PARQUET and/or PKL files to disk. If the PKL file exists, it will load the PKL file as long as its modification date is the newest, otherwise it will raise an error and ask to delete one of the files.
- Parameters
path (
str
) – file path without extension- Raises
RuntimeError – if table file(s) are missing
RuntimeError – if CSV or PARQUET file is newer than PKL file
map_files()¶
pick_columns()¶
- Table.pick_columns(column_ids, *, inplace=False)¶
Pick columns by ID.
All other columns will be dropped.
- Parameters
column_ids – column IDs
inplace – pick columns in place
- Returns
new object if
inplace=False
, otherwiseself
pick_files()¶
pick_index()¶
- Table.pick_index(index, *, inplace=False)¶
Pick rows from index.
- Parameters
index – index object
inplace – pick index in place
- Returns
new object if
inplace=False
, otherwiseself
- Raises
ValueError – if level and dtypes of index does not match table index
save()¶
- Table.save(path, *, storage_format='parquet', update_other_formats=True)¶
Save table data to disk.
Existing files will be overwritten.
When using
"parquet"
asstorage_format
a hash, based on the content of the table, is stored under the keyb"hash"
in the metadata of the schema of the parquet file. This provides a deterministic hash for the file, as md5 sums of parquet files, containing identical information, often differ. Reasons include factors like the library that wrote the parquet file, the chosen compression codec and metadata written by the library.The hash can be accessed with
pyarrow
by:pyarrow.parquet.read_schema(f"{path}.parquet").metadata[b"hash"].decode()
The hash is used by
audb
when publishing a database to track changes of database files.- Parameters
path (
str
) – file path without extensionstorage_format (
str
) – storage format of table. Seeaudformat.define.TableStorageFormat
for available formatsupdate_other_formats (
bool
) – ifTrue
it will not only save to the givenstorage_format
, but update all files stored in other storage formats as well
set()¶
- Table.set(values, *, index=None)¶
Set labels.
By default, all labels of the table are replaced, use
index
to select a subset. If a column is assigned to aScheme
values will be automatically converted to match its dtype.Examples are provided with the table specifications.
to_dict()¶
type¶
- Table.type¶
Table type
See
audformat.define.IndexType
for possible values.
update()¶
- Table.update(others, *, overwrite=False)¶
Update table with other table(s).
Table which calls
update()
to combine tables must be assigned to a database. For all tables media and split must match.Columns that are not yet part of the table will be added and referenced schemes or raters are copied. For overlapping columns, schemes and raters must match.
Columns with the same identifier are combined to a single column. This requires that both columns have the same dtype and if
overwrite
is set toFalse
, values in places where the indices overlap have to match or one column containsNaN
. Ifoverwrite
is set toTrue
, the value of the last table in the list is kept.The index type of the table must not change.
- Parameters
others – table object(s)
overwrite – overwrite values where indices overlap
- Returns
the updated table
- Raises
RuntimeError – if table is not assign to a database
ValueError – if split or media does not match
ValueError – if overlapping columns reference different schemes or raters
ValueError – if a missing scheme or rater cannot be copied because a different object with the same ID exists
ValueError – if values in same position overlap
ValueError – if level and dtypes of table indices do not match