Database dependencies

Media and table files of databases are stored in archive files. A database can also reuse an archive file from a previous version of a database if its content hasn’t changed.

We keep track of those dependencies and store some additional metadata about the audio files like duration and number of channels in a dependency table in a file db.parquet for every version of a database.

You request a audb.Dependencies object with audb.dependencies().

>>> deps = audb.dependencies("emodb", version="1.4.1")

You can see all entries by calling the returned object.

>>> df = deps()
>>> df.head()
                                              archive  bit_depth  ...  type version
db.emotion.csv                                emotion          0  ...     0   1.1.0
db.files.csv                                    files          0  ...     0   1.1.0
wav/03a01Fa.wav  c1f5cc6f-6d00-348a-ba3b-4adaa2436aad         16  ...     1   1.1.0
wav/03a01Nc.wav  d40b53dd-8f0d-d5d3-42e7-8ad7ea6a05a6         16  ...     1   1.1.0
wav/03a01Wa.wav  9b62bc9b-a68e-7e38-6ed1-f4a16ac18511         16  ...     1   1.1.0

[5 rows x 10 columns]

You can also use it to request certain aspects, e.g.

>>> deps.duration("wav/03a01Fa.wav")
1.89825

See audb.Dependencies for all available methods.

Duration of a database

If your database contains only WAV or FLAC files, we store the duration in seconds of every file in the database dependency table.

>>> deps = audb.dependencies("emodb", version="1.4.1")
>>> df = deps()
>>> df.duration[:10]
db.emotion.csv          0.0
db.files.csv            0.0
wav/03a01Fa.wav     1.89825
wav/03a01Nc.wav     1.61125
wav/03a01Wa.wav    1.877813
wav/03a02Fc.wav     2.00625
wav/03a02Nc.wav    1.439812
wav/03a02Ta.wav    1.735688
wav/03a02Wb.wav    2.123625
wav/03a02Wc.wav    1.498063
Name: duration, dtype: double[pyarrow]

For those databases you can get their overall duration with:

>>> audb.info.duration("emodb", version="1.4.1")
Timedelta('0 days 00:24:47.092187500')

The duration of parts of a database can be calculated by first loading the dependency table and filter for the selected media files. The following calculates the duration of the first ten files in the emotion table of the emodb database.

>>> import numpy as np
>>> import pandas as pd
>>> df = audb.load_table("emodb", "emotion", version="1.4.1", verbose=False)
>>> files = df.index[:10]
>>> duration_in_sec = np.sum([deps.duration(f) for f in files])
>>> pd.to_timedelta(duration_in_sec, unit="s")
Timedelta('0 days 00:00:17.392437500')

If your table is a segmented table, and you would like to get the duration of its segments that contain a label you can use audformat.utils.duration(), which calculates the duration from the start and end entries.

>>> import audformat
>>> df = audb.load_table("vadtoolkit", "segments", version="1.1.0", verbose=False)
>>> audformat.utils.duration(df.dropna())
Timedelta('0 days 00:37:17.037467')

Or you can count the duration of all segments within your database.

>>> db = audb.load("vadtoolkit", version="1.1.0", only_metadata=True, verbose=False)
>>> audformat.utils.duration(db.segments)
Timedelta('0 days 00:37:17.037467')

If your database contains files for which no duration information is stored in the dependency table of the database, like MP4 files, you have to download the database first and use audformat.utils.duration() to calculate the duration on the fly.

>>> db = audb.load("database-with-videos")
>>> audformat.utils.duration(db.files, num_workers=4)