Database dependencies

Media and table files of databases are stored in archive files. A database can also reuse an archive file from a previous version of a database if its content hasn’t changed.

We keep track of those dependencies and store some additional metadata about the audio files like duration and number of channels in a dependency table in a file db.parquet for every version of a database.

You request a audb.Dependencies object with audb.dependencies().

deps = audb.dependencies("emodb", version="1.4.1")

You can see all entries by calling the returned object.

df = deps()
df.head()
archive bit_depth channels ... sampling_rate type version
db.emotion.csv emotion 0 0 ... 0 0 1.1.0
db.files.csv files 0 0 ... 0 0 1.1.0
wav/03a01Fa.wav c1f5cc6f-6d00-348a-ba3b-4adaa2436aad 16 1 ... 16000 1 1.1.0
wav/03a01Nc.wav d40b53dd-8f0d-d5d3-42e7-8ad7ea6a05a6 16 1 ... 16000 1 1.1.0
wav/03a01Wa.wav 9b62bc9b-a68e-7e38-6ed1-f4a16ac18511 16 1 ... 16000 1 1.1.0

5 rows × 10 columns

You can also use it to request certain aspects, e.g.

deps.duration("wav/03a01Fa.wav")
1.89825

See audb.Dependencies for all available methods.

Duration of a database

If your database contains only WAV or FLAC files, we store the duration in seconds of every file in the database dependency table.

deps = audb.dependencies("emodb", version="1.4.1")
df = deps()
df.duration[:10]
db.emotion.csv          0.0
db.files.csv            0.0
wav/03a01Fa.wav     1.89825
wav/03a01Nc.wav     1.61125
wav/03a01Wa.wav    1.877813
wav/03a02Fc.wav     2.00625
wav/03a02Nc.wav    1.439812
wav/03a02Ta.wav    1.735688
wav/03a02Wb.wav    2.123625
wav/03a02Wc.wav    1.498063
Name: duration, dtype: double[pyarrow]

For those databases you can get their overall duration with:

audb.info.duration("emodb", version="1.4.1")
Timedelta('0 days 00:24:47.092187500')

The duration of parts of a database can be calculated by first loading the dependency table and filter for the selected media files. The following calculates the duration of the first ten files in the emotion table of the emodb database.

import numpy as np

df = audb.load_table("emodb", "emotion", version="1.4.1", verbose=False)
files = df.index[:10]
duration_in_sec = np.sum([deps.duration(f) for f in files])
pd.to_timedelta(duration_in_sec, unit="s")
Timedelta('0 days 00:00:17.392437500')

If your table is a segmented table, and you would like to get the duration of its segments that contain a label you can use audformat.utils.duration(), which calculates the duration from the start and end entries.

df = audb.load_table("database-with-segmented-tables", "segmented-table")
audformat.utils.duration(df.dropna())

Or you can count the duration of all segments within your database.

db = audb.load("database-with-segmented-tables", only_metadata=True)
audformat.utils.duration(db.segments)

If your database contains files for which no duration information is stored in the dependency table of the database, like MP4 files, you have to download the database first and use audformat.utils.duration() to calculate the duration on the fly.

db = audb.load("database-with-videos")
audformat.utils.duration(db.files, num_workers=4)