Emodb example¶
In this example we download the small emodb database, that contains sentences spoken with different emotions by different actors. The audio is stored as WAV files.
Get source database¶
First we download the source emodb database
to the folder emodb-src
.
import os
import urllib.request
import audeer
# Get database source
source = "http://emodb.bilderbar.info/download/download.zip"
src_dir = "emodb-src"
if not os.path.exists(src_dir):
urllib.request.urlretrieve(source, "emodb.zip")
audeer.extract_archive("emodb.zip", src_dir)
os.listdir(src_dir)
['erkennung.txt', 'wav', 'silb', 'erklaerung.txt', 'labsilb', 'lablaut']
Gather metadata and annotations¶
Afterwards we collect all metadata and annotations that we would like to store in the audformat version of the database.
First, have a look at the file names.
os.listdir(os.path.join(src_dir, "wav"))[:3]
['14b10Nb.wav', '12b09Wc.wav', '15b03Tc.wav']
As described in the emodb documentation the encoding is the following.
Position |
Encoding |
---|---|
0..1 |
speaker |
2..4 |
spoken sentence |
5 |
emotion |
6 |
repetition |
For speaker further information is provided.
Speaker ID |
Gender |
Age |
---|---|---|
03 |
male |
31 |
08 |
female |
34 |
09 |
female |
21 |
10 |
male |
32 |
11 |
male |
26 |
12 |
male |
30 |
13 |
female |
32 |
14 |
female |
35 |
15 |
male |
25 |
16 |
female |
31 |
For the sentences we have transcriptions.
Code |
Transcription |
---|---|
a01 |
Der Lappen liegt auf dem Eisschrank. |
a02 |
Das will sie am Mittwoch abgeben. |
a04 |
Heute Abend könnte ich es ihm sagen. |
a05 |
Das schwarze Stück Papier befindet sich da oben neben dem Holzstück. |
a07 |
In sieben Stunden wird es soweit sein. |
b01 |
Was sind denn das für Tüten, die da unter dem Tisch stehen? |
b02 |
Sie haben es gerade hoch getragen und jetzt gehen sie wieder runter. |
b03 |
An den Wochenenden bin ich jetzt immer nach Hause gefahren und habe Agnes besucht. |
b09 |
Ich will das eben wegbringen und dann mit Karl was trinken gehen. |
b10 |
Die wird auf dem Platz sein, wo wir sie immer hinlegen. |
The emotion codes belong to the following emotions.
Code |
Emotion |
---|---|
W |
anger |
L |
boredom |
E |
disgust |
A |
fear |
F |
happiness |
T |
sadness |
N |
neutral |
As stated in the emodb paper,
the acted emotions were further evaluated
by 20 participants
that had to assign emotion labels
to the audio presentations.
Their agreement of the rating is stored
as the erkannt
column
in the file erkennung.txt
.
We will read in this file
and use the annotations to add a confidence column
to the emotion table.
import audformat
import pandas as pd
# Prepare functions for getting information from file names
def parse_names(names, from_i, to_i, is_number=False, mapping=None):
for name in names:
key = name[from_i:to_i]
if is_number:
key = int(key)
yield mapping[key] if mapping else key
description = (
"Berlin Database of Emotional Speech. "
"A German database of emotional utterances "
"spoken by actors "
"recorded as a part of the DFG funded research project "
"SE462/3-1 in 1997 and 1999. "
"Recordings took place in the anechoic chamber "
"of the Technical University Berlin, "
"department of Technical Acoustics. "
"It contains about 500 utterances "
"from ten different actors "
"expressing basic six emotions and neutral."
)
files = sorted(
[os.path.join("wav", f) for f in os.listdir(os.path.join(src_dir, "wav"))]
)
names = [audeer.basename_wo_ext(f) for f in files]
emotion_mapping = {
"W": "anger",
"L": "boredom",
"E": "disgust",
"A": "fear",
"F": "happiness",
"T": "sadness",
"N": "neutral",
}
emotions = list(parse_names(names, from_i=5, to_i=6, mapping=emotion_mapping))
y = pd.read_csv(
os.path.join(src_dir, "erkennung.txt"),
usecols=["Satz", "erkannt"],
index_col="Satz",
sep="\s+",
encoding="Latin-1",
decimal=",",
converters={"Satz": lambda x: os.path.join("wav", x)},
).squeeze("columns")
y = y.loc[files]
y = y.replace(to_replace=u"\xa0", value="", regex=True)
y = y.replace(to_replace=",", value=".", regex=True)
confidences = y.astype("float").values
male = audformat.define.Gender.MALE
female = audformat.define.Gender.FEMALE
de = audformat.utils.map_language("de")
df_speaker = pd.DataFrame(
index=pd.Index([3, 8, 9, 10, 11, 12, 13, 14, 15, 16], name="speaker"),
columns=["age", "gender", "language"],
data = [
[31, male, de],
[34, female, de],
[21, female, de],
[32, male, de],
[26, male, de],
[30, male, de],
[32, female, de],
[35, female, de],
[25, male, de],
[31, female, de],
],
)
speakers = list(parse_names(names, from_i=0, to_i=2, is_number=True))
transcription_mapping = {
"a01": "Der Lappen liegt auf dem Eisschrank.",
"a02": "Das will sie am Mittwoch abgeben.",
"a04": "Heute abend könnte ich es ihm sagen.",
"a05": "Das schwarze Stück Papier befindet sich da oben neben dem "
"Holzstück.",
"a07": "In sieben Stunden wird es soweit sein.",
"b01": "Was sind denn das für Tüten, die da unter dem Tisch "
"stehen.",
"b02": "Sie haben es gerade hochgetragen und jetzt gehen sie "
"wieder runter.",
"b03": "An den Wochenenden bin ich jetzt immer nach Hause "
"gefahren und habe Agnes besucht.",
"b09": "Ich will das eben wegbringen und dann mit Karl was "
"trinken gehen.",
"b10": "Die wird auf dem Platz sein, wo wir sie immer hinlegen.",
}
transcriptions = list(parse_names(names, from_i=2, to_i=5))
Create audformat database¶
Now we create the database object and assign the information to it.
db = audformat.Database(
name="emodb",
source=source,
usage=audformat.define.Usage.UNRESTRICTED,
languages=[de],
description=description,
meta={
"pdf": (
"http://citeseerx.ist.psu.edu/viewdoc/"
"download?doi=10.1.1.130.8506&rep=rep1&type=pdf"
),
},
)
# Media
db.media["microphone"] = audformat.Media(
format="wav",
sampling_rate=16000,
channels=1,
)
# Raters
db.raters["gold"] = audformat.Rater()
# Schemes
db.schemes["emotion"] = audformat.Scheme(
labels=[str(x) for x in emotion_mapping.values()],
description="Six basic emotions and neutral.",
)
db.schemes["confidence"] = audformat.Scheme(
"float",
minimum=0,
maximum=1,
description="Confidence of emotion ratings.",
)
db.schemes["age"] = audformat.Scheme(
"int",
minimum=0,
description="Age of speaker",
)
db.schemes["gender"] = audformat.Scheme(
labels=["female", "male"],
description="Gender of speaker",
)
db.schemes["language"] = audformat.Scheme(
"str",
description="Language of speaker",
)
db.schemes["transcription"] = audformat.Scheme(
labels=transcription_mapping,
description="Sentence produced by actor.",
)
# MiscTable
db["speaker"] = audformat.MiscTable(df_speaker.index)
db["speaker"]["age"] = audformat.Column(scheme_id="age")
db["speaker"]["gender"] = audformat.Column(scheme_id="gender")
db["speaker"]["language"] = audformat.Column(scheme_id="language")
db["speaker"].set(df_speaker.to_dict(orient="list"))
# MiscTable as Scheme
db.schemes["speaker"] = audformat.Scheme(
labels="speaker",
dtype="int",
description=(
"The actors could produce each sentence as often as "
"they liked and were asked to remember a real "
"situation from their past when they had felt this "
"emotion."
),
)
# Tables
index = audformat.filewise_index(files)
db["files"] = audformat.Table(index)
db["files"]["speaker"] = audformat.Column(scheme_id="speaker")
db["files"]["speaker"].set(speakers)
db["files"]["transcription"] = audformat.Column(scheme_id="transcription")
db["files"]["transcription"].set(transcriptions)
db["emotion"] = audformat.Table(index)
db["emotion"]["emotion"] = audformat.Column(
scheme_id="emotion",
rater_id="gold",
)
db["emotion"]["emotion"].set(emotions)
db["emotion"]["emotion.confidence"] = audformat.Column(
scheme_id="confidence",
rater_id="gold",
)
db["emotion"]["emotion.confidence"].set(confidences / 100.0)
Inspect database header¶
Before storing the database, we can inspect its header.
db
name: emodb
description: Berlin Database of Emotional Speech. A German database of emotional utterances
spoken by actors recorded as a part of the DFG funded research project SE462/3-1
in 1997 and 1999. Recordings took place in the anechoic chamber of the Technical
University Berlin, department of Technical Acoustics. It contains about 500 utterances
from ten different actors expressing basic six emotions and neutral.
source: http://emodb.bilderbar.info/download/download.zip
usage: unrestricted
languages: [deu]
media:
microphone: {type: other, format: wav, channels: 1, sampling_rate: 16000}
raters:
gold: {type: human}
schemes:
age: {description: Age of speaker, dtype: int, minimum: 0}
confidence: {description: Confidence of emotion ratings., dtype: float, minimum: 0,
maximum: 1}
emotion:
description: Six basic emotions and neutral.
dtype: str
labels: [anger, boredom, disgust, fear, happiness, sadness, neutral]
gender:
description: Gender of speaker
dtype: str
labels: [female, male]
language: {description: Language of speaker, dtype: str}
speaker: {description: The actors could produce each sentence as often as they liked
and were asked to remember a real situation from their past when they had felt
this emotion., dtype: int, labels: speaker}
transcription:
description: Sentence produced by actor.
dtype: str
labels: {a01: Der Lappen liegt auf dem Eisschrank., a02: Das will sie am Mittwoch
abgeben., a04: Heute abend könnte ich es ihm sagen., a05: Das schwarze Stück
Papier befindet sich da oben neben dem Holzstück., a07: In sieben Stunden
wird es soweit sein., b01: 'Was sind denn das für Tüten, die da unter dem
Tisch stehen.', b02: Sie haben es gerade hochgetragen und jetzt gehen sie
wieder runter., b03: An den Wochenenden bin ich jetzt immer nach Hause gefahren
und habe Agnes besucht., b09: Ich will das eben wegbringen und dann mit Karl
was trinken gehen., b10: 'Die wird auf dem Platz sein, wo wir sie immer hinlegen.'}
tables:
emotion:
type: filewise
columns:
emotion: {scheme_id: emotion, rater_id: gold}
emotion.confidence: {scheme_id: confidence, rater_id: gold}
files:
type: filewise
columns:
speaker: {scheme_id: speaker}
transcription: {scheme_id: transcription}
misc_tables:
speaker:
levels: {speaker: int}
columns:
age: {scheme_id: age}
gender: {scheme_id: gender}
language: {scheme_id: language}
pdf: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.130.8506&rep=rep1&type=pdf
Inspect database tables¶
First check which tables are available.
list(db)
['emotion', 'files', 'speaker']
Then list the first 10 entries of every table.
db["files"].get()[:10]
speaker | transcription | |
---|---|---|
file | ||
wav/03a01Fa.wav | 3 | a01 |
wav/03a01Nc.wav | 3 | a01 |
wav/03a01Wa.wav | 3 | a01 |
wav/03a02Fc.wav | 3 | a02 |
wav/03a02Nc.wav | 3 | a02 |
wav/03a02Ta.wav | 3 | a02 |
wav/03a02Wb.wav | 3 | a02 |
wav/03a02Wc.wav | 3 | a02 |
wav/03a04Ad.wav | 3 | a04 |
wav/03a04Fd.wav | 3 | a04 |
db["emotion"].get()[:10]
emotion | emotion.confidence | |
---|---|---|
file | ||
wav/03a01Fa.wav | happiness | 0.90 |
wav/03a01Nc.wav | neutral | 1.00 |
wav/03a01Wa.wav | anger | 0.95 |
wav/03a02Fc.wav | happiness | 0.85 |
wav/03a02Nc.wav | neutral | 1.00 |
wav/03a02Ta.wav | sadness | 0.90 |
wav/03a02Wb.wav | anger | 1.00 |
wav/03a02Wc.wav | anger | 1.00 |
wav/03a04Ad.wav | fear | 0.90 |
wav/03a04Fd.wav | happiness | 1.00 |
db["speaker"].get()[:10]
age | gender | language | |
---|---|---|---|
speaker | |||
3 | 31 | male | deu |
8 | 34 | female | deu |
9 | 21 | female | deu |
10 | 32 | male | deu |
11 | 26 | male | deu |
12 | 30 | male | deu |
13 | 32 | female | deu |
14 | 35 | female | deu |
15 | 25 | male | deu |
16 | 31 | female | deu |
Columns might contain labels,
that provide additional mappings.
You can access this additional information
with the map
argument of audformat.Table.get()
,
see Map scheme labels
for an extended documentation.
db["files"].get(map={"speaker": ["speaker", "age", "gender"]})[:10]
speaker | transcription | age | gender | |
---|---|---|---|---|
file | ||||
wav/03a01Fa.wav | 3 | a01 | 31 | male |
wav/03a01Nc.wav | 3 | a01 | 31 | male |
wav/03a01Wa.wav | 3 | a01 | 31 | male |
wav/03a02Fc.wav | 3 | a02 | 31 | male |
wav/03a02Nc.wav | 3 | a02 | 31 | male |
wav/03a02Ta.wav | 3 | a02 | 31 | male |
wav/03a02Wb.wav | 3 | a02 | 31 | male |
wav/03a02Wc.wav | 3 | a02 | 31 | male |
wav/03a04Ad.wav | 3 | a04 | 31 | male |
wav/03a04Fd.wav | 3 | a04 | 31 | male |
Store database to disk¶
Now we store the database in the folder emodb
.
Note, that we have to make sure
that the media files are located at the correct position ourselves.
import shutil
db_dir = audeer.mkdir("emodb")
shutil.copytree(
os.path.join(src_dir, "wav"),
os.path.join(db_dir, "wav"),
)
db.save(db_dir)
os.listdir(db_dir)
['db.emotion.parquet',
'db.files.parquet',
'wav',
'db.yaml',
'db.speaker.parquet']
You can read the database from disk as well.
db = audformat.Database.load(db_dir)
db.name
'emodb'