Load a database¶
To load a database you only need its name.
However,
we recommend to specify its version as well.
This is not needed,
as audb.load()
searches automatically
for the latest available version,
but it will ensure your code returns the same data,
even if a new version of the database is published.
>>> db = audb.load("emodb", version="1.4.1", full_path=False, verbose=False)
audb.load()
will download the data,
store them in a cache folder,
and return the database as an audformat.Database
object.
The most important content of that object
are the database tables.
>>> db.tables
emotion:
type: filewise
columns:
emotion: {scheme_id: emotion, rater_id: gold}
emotion.confidence: {scheme_id: confidence, rater_id: gold}
emotion.categories.test.gold_standard:
type: filewise
split_id: test
columns:
emotion: {scheme_id: emotion, rater_id: gold}
emotion.confidence: {scheme_id: confidence, rater_id: gold}
emotion.categories.train.gold_standard:
type: filewise
split_id: train
columns:
emotion: {scheme_id: emotion, rater_id: gold}
emotion.confidence: {scheme_id: confidence, rater_id: gold}
files:
type: filewise
columns:
duration: {scheme_id: duration}
speaker: {scheme_id: speaker}
transcription: {scheme_id: transcription}
They contain the annotations of the database,
and can be requested as a pandas.DataFrame
.
>>> db["emotion"].get()
emotion emotion.confidence
file
wav/03a01Fa.wav happiness 0.90
wav/03a01Nc.wav neutral 1.00
wav/03a01Wa.wav anger 0.95
wav/03a02Fc.wav happiness 0.85
wav/03a02Nc.wav neutral 1.00
... ... ...
wav/16b10Lb.wav boredom 1.00
wav/16b10Tb.wav sadness 0.90
wav/16b10Td.wav sadness 0.95
wav/16b10Wa.wav anger 1.00
wav/16b10Wb.wav anger 1.00
[535 rows x 2 columns]
Or you can directly request single columns as pandas.Series
.
>>> db["files"]["duration"].get()
file
wav/03a01Fa.wav 0 days 00:00:01.898250
wav/03a01Nc.wav 0 days 00:00:01.611250
wav/03a01Wa.wav 0 days 00:00:01.877812500
wav/03a02Fc.wav 0 days 00:00:02.006250
wav/03a02Nc.wav 0 days 00:00:01.439812500
...
wav/16b10Lb.wav 0 days 00:00:03.442687500
wav/16b10Tb.wav 0 days 00:00:03.500625
wav/16b10Td.wav 0 days 00:00:03.934187500
wav/16b10Wa.wav 0 days 00:00:02.414125
wav/16b10Wb.wav 0 days 00:00:02.522499999
Name: duration, Length: 535, dtype: timedelta64[ns]
As you can see the index of the returned object holds the path to the corresponding media files.
For a full overview how to handle the database object we refer the reader to the corresponding audformat documentation. We also recommend to make you familiar how to combine tables and how to map labels.
Here, we continue with discussing Media conversion and flavors, how to load Metadata and header only, and Loading on demand.
Media conversion and flavors¶
When loading a database,
audio files can be automatically converted.
This creates a new flavor of the database,
represented by audb.Flavor
.
The following properties can be changed.
bit_depth:
- 8
- 16
- 24
- 32 (WAV only)
format:
- 'wav'
- 'flac'
channels:
- 0 # select first channel
- [0, -1] # select first and last channel
- ...
mixdown:
- False
- True
sampling_rate:
- 8000
- 16000
- 22050
- 24000
- 44100
- 48000
The next example will convert the original files to FLAC with a sampling rate of 44100 Hz. For each flavor a sub-folder will be created inside the cache.
db = audb.load(
"emodb",
version="1.4.1",
format="flac",
sampling_rate=44100,
verbose=False,
)
The flavor information of a database is stored
inside the db.meta["audb"]
dictionary.
>>> db.meta["audb"]["flavor"]
{'bit_depth': None,
'channels': None,
'format': 'flac',
'mixdown': False,
'sampling_rate': 44100}
You can list all available flavors and their locations in the cache with:
>>> df = audb.cached()
>>> df.reset_index()[["name", "version", "complete", "format", "sampling_rate"]]
name version complete format sampling_rate
0 emodb 1.4.1 False flac 44100
1 emodb 1.4.1 False None None
The entry "complete"
tells you if a database flavor is completely cached,
or if some table or media files are still missing.
Metadata and header only¶
It is possible to request only metadata (header and annotations) of a database. In that case media files are not loaded, but all the tables and the header.
>>> db = audb.load("emodb", version="1.4.1", only_metadata=True, verbose=False)
For databases with many annotations,
this can still take some time.
If you are only interested in header information,
you can use audb.info.header()
.
Or if you are only interested
in parts of the header,
have a look at the audb.info
module.
It can list all table definitions.
>>> audb.info.tables("emodb", version="1.4.1")
emotion:
type: filewise
columns:
emotion: {scheme_id: emotion, rater_id: gold}
emotion.confidence: {scheme_id: confidence, rater_id: gold}
emotion.categories.test.gold_standard:
type: filewise
split_id: test
columns:
emotion: {scheme_id: emotion, rater_id: gold}
emotion.confidence: {scheme_id: confidence, rater_id: gold}
emotion.categories.train.gold_standard:
type: filewise
split_id: train
columns:
emotion: {scheme_id: emotion, rater_id: gold}
emotion.confidence: {scheme_id: confidence, rater_id: gold}
files:
type: filewise
columns:
duration: {scheme_id: duration}
speaker: {scheme_id: speaker}
transcription: {scheme_id: transcription}
Or get the total duration of all media files.
>>> audb.info.duration("emodb", version="1.4.1")
Timedelta('0 days 00:24:47.092187500')
See audb.info
for a list of all available options.
Loading on demand¶
It is possible to request only specific tables or media of a database.
For instance, many databases are organized into train, dev, and test splits. Hence, to evaluate the performance of a machine learning model, we don’t have to download the full database, but only the table(s) and media of the test set.
Or, if we want the data of a specific speaker,
we can do the following.
First, we download the table with information
about the speakers (here db["files"]
):
db = audb.load(
"emodb",
version="1.4.1",
tables=["files"],
only_metadata=True,
full_path=False,
verbose=False,
)
>>> db.tables
files:
type: filewise
columns:
duration: {scheme_id: duration}
speaker: {scheme_id: speaker}
transcription: {scheme_id: transcription}
Note,
that we set only_metadata=True
since we only need the labels at the moment.
By setting full_path=False
we further ensure that the paths
in the table index are relative
and therefore match the paths on the backend.
>>> speaker = db["files"]["speaker"].get()
>>> speaker
file
wav/03a01Fa.wav 3
wav/03a01Nc.wav 3
wav/03a01Wa.wav 3
wav/03a02Fc.wav 3
wav/03a02Nc.wav 3
..
wav/16b10Lb.wav 16
wav/16b10Tb.wav 16
wav/16b10Td.wav 16
wav/16b10Wa.wav 16
wav/16b10Wb.wav 16
Name: speaker, Length: 535, dtype: category
Categories (10, int64): [3, 8, 9, 10, ..., 13, 14, 15, 16]
Now, we use the column with speaker IDs to get a list of media files that belong to speaker 3.
>>> media = db["files"].files[speaker == 3]
>>> media
Index(['wav/03a01Fa.wav', 'wav/03a01Nc.wav', 'wav/03a01Wa.wav',
'wav/03a02Fc.wav', 'wav/03a02Nc.wav', 'wav/03a02Ta.wav',
'wav/03a02Wb.wav', 'wav/03a02Wc.wav', 'wav/03a04Ad.wav',
'wav/03a04Fd.wav', 'wav/03a04Lc.wav', 'wav/03a04Nc.wav',
'wav/03a04Ta.wav', 'wav/03a04Wc.wav', 'wav/03a05Aa.wav',
'wav/03a05Fc.wav', 'wav/03a05Nd.wav', 'wav/03a05Tc.wav',
'wav/03a05Wa.wav', 'wav/03a05Wb.wav', 'wav/03a07Fa.wav',
'wav/03a07Fb.wav', 'wav/03a07La.wav', 'wav/03a07Nc.wav',
'wav/03a07Wc.wav', 'wav/03b01Fa.wav', 'wav/03b01Lb.wav',
'wav/03b01Nb.wav', 'wav/03b01Td.wav', 'wav/03b01Wa.wav',
'wav/03b01Wc.wav', 'wav/03b02Aa.wav', 'wav/03b02La.wav',
'wav/03b02Na.wav', 'wav/03b02Tb.wav', 'wav/03b02Wb.wav',
'wav/03b03Nb.wav', 'wav/03b03Tc.wav', 'wav/03b03Wc.wav',
'wav/03b09La.wav', 'wav/03b09Nc.wav', 'wav/03b09Tc.wav',
'wav/03b09Wa.wav', 'wav/03b10Ab.wav', 'wav/03b10Ec.wav',
'wav/03b10Na.wav', 'wav/03b10Nc.wav', 'wav/03b10Wb.wav',
'wav/03b10Wc.wav'],
dtype='string', name='file')
Finally, we load the database again and use the list to request only the data of this speaker.
db = audb.load(
"emodb",
version="1.4.1",
media=media,
full_path=False,
verbose=False,
)
This will also remove entries of other speakers from the tables.
>>> db["emotion"].get().head()
emotion emotion.confidence
file
wav/03a01Fa.wav happiness 0.90
wav/03a01Nc.wav neutral 1.00
wav/03a01Wa.wav anger 0.95
wav/03a02Fc.wav happiness 0.85
wav/03a02Nc.wav neutral 1.00
Streaming¶
audb.stream()
provides a pseudo-streaming mode,
which helps to load large datasets.
It will only load batch_size
number of rows
from a selected table into memory,
and download only matching media files
in each iteration.
The table and media files
are still stored in the cache.
db = audb.stream(
"emodb",
"emotion",
version="1.4.1",
batch_size=4,
full_path=False,
verbose=False,
)
It returns an audb.DatabaseIterator
object,
which behaves as audformat.Database
,
but provides the ability
to iterate over the database:
>>> next(db)
emotion emotion.confidence
file
wav/03a01Fa.wav happiness 0.90
wav/03a01Nc.wav neutral 1.00
wav/03a01Wa.wav anger 0.95
wav/03a02Fc.wav happiness 0.85
With shuffle=True
,
a user can request
that the data is returned in a random order.
audb.stream()
will then load buffer_size
of rows
into an buffer and selected randomly from those.
import numpy as np
np.random.seed(1)
db = audb.stream(
"emodb",
"emotion",
version="1.4.1",
batch_size=4,
shuffle=True,
buffer_size=100_000,
only_metadata=True,
full_path=False,
verbose=False,
)
>>> next(db)
emotion emotion.confidence
file
wav/14a05Fb.wav happiness 1.0
wav/15a05Eb.wav disgust 1.0
wav/12a05Nd.wav neutral 0.9
wav/13a07Na.wav neutral 0.9