Database dependencies¶
Media and table files of databases are stored in archive files. A database can also reuse an archive file from a previous version of a database if its content hasn’t changed.
We keep track of those dependencies
and store some additional metadata about the audio files
like duration and number of channels
in a dependency table in a file db.parquet
for every version of a database.
You request a audb.Dependencies
object with
audb.dependencies()
.
deps = audb.dependencies("emodb", version="1.4.1")
You can see all entries by calling the returned object.
df = deps()
df.head()
archive | bit_depth | channels | ... | sampling_rate | type | version | |
---|---|---|---|---|---|---|---|
db.emotion.csv | emotion | 0 | 0 | ... | 0 | 0 | 1.1.0 |
db.files.csv | files | 0 | 0 | ... | 0 | 0 | 1.1.0 |
wav/03a01Fa.wav | c1f5cc6f-6d00-348a-ba3b-4adaa2436aad | 16 | 1 | ... | 16000 | 1 | 1.1.0 |
wav/03a01Nc.wav | d40b53dd-8f0d-d5d3-42e7-8ad7ea6a05a6 | 16 | 1 | ... | 16000 | 1 | 1.1.0 |
wav/03a01Wa.wav | 9b62bc9b-a68e-7e38-6ed1-f4a16ac18511 | 16 | 1 | ... | 16000 | 1 | 1.1.0 |
5 rows × 10 columns
You can also use it to request certain aspects, e.g.
deps.duration("wav/03a01Fa.wav")
1.89825
See audb.Dependencies
for all available methods.
Duration of a database¶
If your database contains only WAV or FLAC files, we store the duration in seconds of every file in the database dependency table.
deps = audb.dependencies("emodb", version="1.4.1")
df = deps()
df.duration[:10]
db.emotion.csv 0.0
db.files.csv 0.0
wav/03a01Fa.wav 1.89825
wav/03a01Nc.wav 1.61125
wav/03a01Wa.wav 1.877813
wav/03a02Fc.wav 2.00625
wav/03a02Nc.wav 1.439812
wav/03a02Ta.wav 1.735688
wav/03a02Wb.wav 2.123625
wav/03a02Wc.wav 1.498063
Name: duration, dtype: double[pyarrow]
For those databases you can get their overall duration with:
audb.info.duration("emodb", version="1.4.1")
Timedelta('0 days 00:24:47.092187500')
The duration of parts of a database can be calculated by first loading the dependency table and filter for the selected media files. The following calculates the duration of the first ten files in the emotion table of the emodb database.
import numpy as np
df = audb.load_table("emodb", "emotion", version="1.4.1", verbose=False)
files = df.index[:10]
duration_in_sec = np.sum([deps.duration(f) for f in files])
pd.to_timedelta(duration_in_sec, unit="s")
Timedelta('0 days 00:00:17.392437500')
If your table is a segmented table,
and you would like to get the duration
of its segments
that contain a label
you can use audformat.utils.duration()
,
which calculates the duration
from the start
and end
entries.
df = audb.load_table("database-with-segmented-tables", "segmented-table")
audformat.utils.duration(df.dropna())
Or you can count the duration of all segments within your database.
db = audb.load("database-with-segmented-tables", only_metadata=True)
audformat.utils.duration(db.segments)
If your database contains files
for which no duration information is stored
in the dependency table of the database,
like MP4 files,
you have to download the database first
and use audformat.utils.duration()
to calculate the duration on the fly.
db = audb.load("database-with-videos")
audformat.utils.duration(db.files, num_workers=4)