MiscTable¶
- class audformat.MiscTable(index, *, split_id=None, media_id=None, description=None, meta=None)[source]¶
Miscellaneous table.
Note
Intended for use with tables that have an index that is not conform to table specifications. Otherwise, use
audformat.Table
.To fill a table with labels, add one or more
audformat.Column
and useaudformat.MiscTable.set()
to set the values. When adding a column, the column ID must be different from the index level names. When initialized with a single-levelpandas.MultiIndex
, the index will be converted to apandas.Index
.- Parameters
- Raises
ValueError – if level names of index are empty or not unique
Examples
>>> index = pd.MultiIndex.from_tuples( ... [ ... ("f1", "f2"), ... ("f1", "f3"), ... ("f2", "f3"), ... ], ... names=["file", "other"], ... ) >>> index = utils.set_index_dtypes(index, "string") >>> table = MiscTable( ... index, ... split_id=define.SplitType.TEST, ... ) >>> table["match"] = Column() >>> table levels: {file: str, other: str} split_id: test columns: match: {} >>> table.get() match file other f1 f2 NaN f3 NaN f2 f3 NaN >>> table.set({"match": [True, False, True]}) >>> table.get() match file other f1 f2 True f3 False f2 f3 True >>> table.get(index[:2]) match file other f1 f2 True f3 False >>> index_new = pd.MultiIndex.from_tuples( ... [ ... ("f4", "f1"), ... ], ... names=["file", "other"], ... ) >>> index_new = utils.set_index_dtypes(index_new, "string") >>> table_ex = table.extend_index( ... index_new, ... inplace=False, ... ) >>> table_ex.get() match file other f1 f2 True f3 False f2 f3 True f4 f1 NaN >>> table_ex.set( ... {"match": True}, ... index=index_new, ... ) >>> table_ex.get() match file other f1 f2 True f3 False f2 f3 True f4 f1 True >>> table_str = MiscTable(index) >>> table_str["strings"] = Column() >>> table_str.set({"strings": ["a", "b", "c"]}) >>> (table + table_str).get() match strings file other f1 f2 True a f3 False b f2 f3 True c >>> (table_ex + table_str).get() match strings file other f1 f2 True a f3 False b f2 f3 True c f4 f1 True NaN
__add__()¶
- MiscTable.__add__(other)¶
Create new table by combining two tables.
The new combined table contains index and columns of both tables. Missing values will be set to
NaN
.If table is conform to table specifications and at least one table is segmented, the output has a segmented index.
Columns with the same identifier are combined to a single column. This requires that:
both columns have the same dtype
in places where the indices overlap the values of both columns match or one column contains
NaN
Media and split information, as well as, references to schemes and raters are discarded. If you intend to keep them, use
update()
.- Parameters
other – the other table
- Raises
ValueError – if columns with the same name have different dtypes
ValueError – if values in the same position do not match
ValueError – if level and dtypes of indices do not match
__getitem__()¶
__setitem__()¶
- MiscTable.__setitem__(column_id, column)¶
Add new column to table.
- Parameters
- Raises
BadIdError – if a column with a
scheme_id
orrater_id
is added that does not existValueError – if column ID is not different from level names
ValueError – if the column is linked to a scheme that is using labels from a misc table, but the misc table the column is assigned to is already used by the same or another scheme
- Return type
drop_columns()¶
- MiscTable.drop_columns(column_ids, *, inplace=False)¶
Drop columns by ID.
- Parameters
column_ids – column IDs
inplace – drop columns in place
- Returns
new object if
inplace=False
, otherwiseself
drop_index()¶
- MiscTable.drop_index(index, *, inplace=False)¶
Drop rows from index.
- Parameters
index – index object
inplace – drop index in place
- Returns
new object if
inplace=False
, otherwiseself
- Raises
ValueError – if level and dtypes of index does not match table index
dump()¶
extend_index()¶
- MiscTable.extend_index(index, *, fill_values=None, inplace=False)¶
Extend table with new rows.
- Parameters
index – index object
fill_values – replace NaN with these values (either a scalar applied to all columns or a dictionary with column name as key)
inplace – extend index in place
- Returns
new object if
inplace=False
, otherwiseself
- Raises
ValueError – if level and dtypes of index does not match table index
from_dict()¶
get()¶
- MiscTable.get(index=None, *, map=None, copy=True)¶
Get labels.
By default, all labels of the table are returned, use
index
to get a subset.Examples are provided with the table specifications, and for
map
in Map scheme labels.- Parameters
copy (
bool
) – return a copy of the labelsmap (
Optional
[Dict
[str
,Union
[str
,Sequence
[str
]]]]) – map scheme or scheme fields to column values. For example if your table holds a columnspeaker
with speaker IDs, which is assigned to a scheme that contains a dict mapping speaker IDs to age and gender entries,map={'speaker': ['age', 'gender']}
will replace the column with two new columns that map ID values to age and gender, respectively. To also keep the original column with speaker IDS, you can domap={'speaker': ['speaker', 'age', 'gender']}
- Return type
- Returns
labels
- Raises
FileNotFoundError – if file is not found
RuntimeError – if table is not assign to a database
ValueError – if trying to map without a scheme
ValueError – if trying to map from a scheme that has no labels
ValueError – if trying to map to a non-existing field
load()¶
- MiscTable.load(path)¶
Load table data from disk.
Tables are stored as CSV, PARQUET and/or PKL files to disk. If the PKL file exists, it will load the PKL file as long as its modification date is the newest, otherwise it will raise an error and ask to delete one of the files.
- Parameters
path (
str
) – file path without extension- Raises
RuntimeError – if table file(s) are missing
RuntimeError – if CSV or PARQUET file is newer than PKL file
pick_columns()¶
- MiscTable.pick_columns(column_ids, *, inplace=False)¶
Pick columns by ID.
All other columns will be dropped.
- Parameters
column_ids – column IDs
inplace – pick columns in place
- Returns
new object if
inplace=False
, otherwiseself
pick_index()¶
- MiscTable.pick_index(index, *, inplace=False)¶
Pick rows from index.
- Parameters
index – index object
inplace – pick index in place
- Returns
new object if
inplace=False
, otherwiseself
- Raises
ValueError – if level and dtypes of index does not match table index
save()¶
- MiscTable.save(path, *, storage_format='parquet', update_other_formats=True)¶
Save table data to disk.
Existing files will be overwritten.
When using
"parquet"
asstorage_format
a hash, based on the content of the table, is stored under the keyb"hash"
in the metadata of the schema of the parquet file. This provides a deterministic hash for the file, as md5 sums of parquet files, containing identical information, often differ. Reasons include factors like the library that wrote the parquet file, the chosen compression codec and metadata written by the library.The hash can be accessed with
pyarrow
by:pyarrow.parquet.read_schema(f"{path}.parquet").metadata[b"hash"].decode()
The hash is used by
audb
when publishing a database to track changes of database files.- Parameters
path (
str
) – file path without extensionstorage_format (
str
) – storage format of table. Seeaudformat.define.TableStorageFormat
for available formatsupdate_other_formats (
bool
) – ifTrue
it will not only save to the givenstorage_format
, but update all files stored in other storage formats as well
set()¶
- MiscTable.set(values, *, index=None)¶
Set labels.
By default, all labels of the table are replaced, use
index
to select a subset. If a column is assigned to aScheme
values will be automatically converted to match its dtype.Examples are provided with the table specifications.
to_dict()¶
update()¶
- MiscTable.update(others, *, overwrite=False)¶
Update table with other table(s).
Table which calls
update()
to combine tables must be assigned to a database. For all tables media and split must match.Columns that are not yet part of the table will be added and referenced schemes or raters are copied. For overlapping columns, schemes and raters must match.
Columns with the same identifier are combined to a single column. This requires that both columns have the same dtype and if
overwrite
is set toFalse
, values in places where the indices overlap have to match or one column containsNaN
. Ifoverwrite
is set toTrue
, the value of the last table in the list is kept.The index type of the table must not change.
- Parameters
others – table object(s)
overwrite – overwrite values where indices overlap
- Returns
the updated table
- Raises
RuntimeError – if table is not assign to a database
ValueError – if split or media does not match
ValueError – if overlapping columns reference different schemes or raters
ValueError – if a missing scheme or rater cannot be copied because a different object with the same ID exists
ValueError – if values in same position overlap
ValueError – if level and dtypes of table indices do not match