Emodb example

In this example we download the small emodb database, that contains sentences spoken with different emotions by different actors. The audio is stored as WAV files.

Get source database

First we download the source emodb database to the folder emodb-src.

import os
import urllib.request

import audeer


# Get database source
source = "http://emodb.bilderbar.info/download/download.zip"
src_dir = "emodb-src"
if not os.path.exists(src_dir):
    urllib.request.urlretrieve(source, "emodb.zip")
    audeer.extract_archive("emodb.zip", src_dir)

os.listdir(src_dir)
['erkennung.txt', 'wav', 'silb', 'erklaerung.txt', 'labsilb', 'lablaut']

Gather metadata and annotations

Afterwards we collect all metadata and annotations that we would like to store in the audformat version of the database.

First, have a look at the file names.

os.listdir(os.path.join(src_dir, "wav"))[:3]
['14b10Nb.wav', '12b09Wc.wav', '15b03Tc.wav']

As described in the emodb documentation the encoding is the following.

Position

Encoding

0..1

speaker

2..4

spoken sentence

5

emotion

6

repetition

For speaker further information is provided.

Speaker ID

Gender

Age

03

male

31

08

female

34

09

female

21

10

male

32

11

male

26

12

male

30

13

female

32

14

female

35

15

male

25

16

female

31

For the sentences we have transcriptions.

Code

Transcription

a01

Der Lappen liegt auf dem Eisschrank.

a02

Das will sie am Mittwoch abgeben.

a04

Heute Abend könnte ich es ihm sagen.

a05

Das schwarze Stück Papier befindet sich da oben neben dem Holzstück.

a07

In sieben Stunden wird es soweit sein.

b01

Was sind denn das für Tüten, die da unter dem Tisch stehen?

b02

Sie haben es gerade hoch getragen und jetzt gehen sie wieder runter.

b03

An den Wochenenden bin ich jetzt immer nach Hause gefahren und habe Agnes besucht.

b09

Ich will das eben wegbringen und dann mit Karl was trinken gehen.

b10

Die wird auf dem Platz sein, wo wir sie immer hinlegen.

The emotion codes belong to the following emotions.

Code

Emotion

W

anger

L

boredom

E

disgust

A

fear

F

happiness

T

sadness

N

neutral

As stated in the emodb paper, the acted emotions were further evaluated by 20 participants that had to assign emotion labels to the audio presentations. Their agreement of the rating is stored as the erkannt column in the file erkennung.txt. We will read in this file and use the annotations to add a confidence column to the emotion table.

import audformat
import pandas as pd

# Prepare functions for getting information from file names
def parse_names(names, from_i, to_i, is_number=False, mapping=None):
    for name in names:
        key = name[from_i:to_i]
        if is_number:
            key = int(key)
        yield mapping[key] if mapping else key


description = (
   "Berlin Database of Emotional Speech. "
   "A German database of emotional utterances "
   "spoken by actors "
   "recorded as a part of the DFG funded research project "
   "SE462/3-1 in 1997 and 1999. "
   "Recordings took place in the anechoic chamber "
   "of the Technical University Berlin, "
   "department of Technical Acoustics. "
   "It contains about 500 utterances "
   "from ten different actors "
   "expressing basic six emotions and neutral."
)

files = sorted(
    [os.path.join("wav", f) for f in os.listdir(os.path.join(src_dir, "wav"))]
)
names = [audeer.basename_wo_ext(f) for f in files]

emotion_mapping = {
    "W": "anger",
    "L": "boredom",
    "E": "disgust",
    "A": "fear",
    "F": "happiness",
    "T": "sadness",
    "N": "neutral",
}
emotions = list(parse_names(names, from_i=5, to_i=6, mapping=emotion_mapping))

y = pd.read_csv(
    os.path.join(src_dir, "erkennung.txt"),
    usecols=["Satz", "erkannt"],
    index_col="Satz",
    sep="\s+",
    encoding="Latin-1",
    decimal=",",
    converters={"Satz": lambda x: os.path.join("wav", x)},
).squeeze("columns")
y = y.loc[files]
y = y.replace(to_replace=u"\xa0", value="", regex=True)
y = y.replace(to_replace=",", value=".", regex=True)
confidences = y.astype("float").values

male = audformat.define.Gender.MALE
female = audformat.define.Gender.FEMALE
de = audformat.utils.map_language("de")
df_speaker = pd.DataFrame(
    index=pd.Index([3, 8, 9, 10, 11, 12, 13, 14, 15, 16], name="speaker"),
    columns=["age", "gender", "language"],
    data = [
        [31, male, de],
        [34, female, de],
        [21, female, de],
        [32, male, de],
        [26, male, de],
        [30, male, de],
        [32, female, de],
        [35, female, de],
        [25, male, de],
        [31, female, de],
   ],
)
speakers = list(parse_names(names, from_i=0, to_i=2, is_number=True))

transcription_mapping = {
    "a01": "Der Lappen liegt auf dem Eisschrank.",
    "a02": "Das will sie am Mittwoch abgeben.",
    "a04": "Heute abend könnte ich es ihm sagen.",
    "a05": "Das schwarze Stück Papier befindet sich da oben neben dem "
           "Holzstück.",
    "a07": "In sieben Stunden wird es soweit sein.",
    "b01": "Was sind denn das für Tüten, die da unter dem Tisch "
           "stehen.",
    "b02": "Sie haben es gerade hochgetragen und jetzt gehen sie "
           "wieder runter.",
    "b03": "An den Wochenenden bin ich jetzt immer nach Hause "
           "gefahren und habe Agnes besucht.",
    "b09": "Ich will das eben wegbringen und dann mit Karl was "
           "trinken gehen.",
    "b10": "Die wird auf dem Platz sein, wo wir sie immer hinlegen.",
}
transcriptions = list(parse_names(names, from_i=2, to_i=5))

Create audformat database

Now we create the database object and assign the information to it.

db = audformat.Database(
    name="emodb",
    source=source,
    usage=audformat.define.Usage.UNRESTRICTED,
    languages=[de],
    description=description,
    meta={
        "pdf": (
            "http://citeseerx.ist.psu.edu/viewdoc/"
            "download?doi=10.1.1.130.8506&rep=rep1&type=pdf"
        ),
    },
)

# Media
db.media["microphone"] = audformat.Media(
    format="wav",
    sampling_rate=16000,
    channels=1,
)

# Raters
db.raters["gold"] = audformat.Rater()

# Schemes
db.schemes["emotion"] = audformat.Scheme(
    labels=[str(x) for x in emotion_mapping.values()],
    description="Six basic emotions and neutral.",
)
db.schemes["confidence"] = audformat.Scheme(
    "float",
    minimum=0,
    maximum=1,
    description="Confidence of emotion ratings.",
)
db.schemes["age"] = audformat.Scheme(
    "int",
    minimum=0,
    description="Age of speaker",
)
db.schemes["gender"] = audformat.Scheme(
    labels=["female", "male"],
    description="Gender of speaker",
)
db.schemes["language"] = audformat.Scheme(
    "str",
    description="Language of speaker",
)
db.schemes["transcription"] = audformat.Scheme(
    labels=transcription_mapping,
    description="Sentence produced by actor.",
)

# MiscTable
db["speaker"] = audformat.MiscTable(df_speaker.index)
db["speaker"]["age"] = audformat.Column(scheme_id="age")
db["speaker"]["gender"] = audformat.Column(scheme_id="gender")
db["speaker"]["language"] = audformat.Column(scheme_id="language")
db["speaker"].set(df_speaker.to_dict(orient="list"))

# MiscTable as Scheme
db.schemes["speaker"] = audformat.Scheme(
    labels="speaker",
    dtype="int",
    description=(
        "The actors could produce each sentence as often as "
        "they liked and were asked to remember a real "
        "situation from their past when they had felt this "
        "emotion."
    ),
)

# Tables
index = audformat.filewise_index(files)
db["files"] = audformat.Table(index)

db["files"]["speaker"] = audformat.Column(scheme_id="speaker")
db["files"]["speaker"].set(speakers)

db["files"]["transcription"] = audformat.Column(scheme_id="transcription")
db["files"]["transcription"].set(transcriptions)

db["emotion"] = audformat.Table(index)
db["emotion"]["emotion"] = audformat.Column(
    scheme_id="emotion",
    rater_id="gold",
)
db["emotion"]["emotion"].set(emotions)
db["emotion"]["emotion.confidence"] = audformat.Column(
    scheme_id="confidence",
    rater_id="gold",
)
db["emotion"]["emotion.confidence"].set(confidences / 100.0)

Inspect database header

Before storing the database, we can inspect its header.

db
name: emodb
description: Berlin Database of Emotional Speech. A German database of emotional utterances
  spoken by actors recorded as a part of the DFG funded research project SE462/3-1
  in 1997 and 1999. Recordings took place in the anechoic chamber of the Technical
  University Berlin, department of Technical Acoustics. It contains about 500 utterances
  from ten different actors expressing basic six emotions and neutral.
source: http://emodb.bilderbar.info/download/download.zip
usage: unrestricted
languages: [deu]
media:
  microphone: {type: other, format: wav, channels: 1, sampling_rate: 16000}
raters:
  gold: {type: human}
schemes:
  age: {description: Age of speaker, dtype: int, minimum: 0}
  confidence: {description: Confidence of emotion ratings., dtype: float, minimum: 0,
    maximum: 1}
  emotion:
    description: Six basic emotions and neutral.
    dtype: str
    labels: [anger, boredom, disgust, fear, happiness, sadness, neutral]
  gender:
    description: Gender of speaker
    dtype: str
    labels: [female, male]
  language: {description: Language of speaker, dtype: str}
  speaker: {description: The actors could produce each sentence as often as they liked
      and were asked to remember a real situation from their past when they had felt
      this emotion., dtype: int, labels: speaker}
  transcription:
    description: Sentence produced by actor.
    dtype: str
    labels: {a01: Der Lappen liegt auf dem Eisschrank., a02: Das will sie am Mittwoch
        abgeben., a04: Heute abend könnte ich es ihm sagen., a05: Das schwarze Stück
        Papier befindet sich da oben neben dem Holzstück., a07: In sieben Stunden
        wird es soweit sein., b01: 'Was sind denn das für Tüten, die da unter dem
        Tisch stehen.', b02: Sie haben es gerade hochgetragen und jetzt gehen sie
        wieder runter., b03: An den Wochenenden bin ich jetzt immer nach Hause gefahren
        und habe Agnes besucht., b09: Ich will das eben wegbringen und dann mit Karl
        was trinken gehen., b10: 'Die wird auf dem Platz sein, wo wir sie immer hinlegen.'}
tables:
  emotion:
    type: filewise
    columns:
      emotion: {scheme_id: emotion, rater_id: gold}
      emotion.confidence: {scheme_id: confidence, rater_id: gold}
  files:
    type: filewise
    columns:
      speaker: {scheme_id: speaker}
      transcription: {scheme_id: transcription}
misc_tables:
  speaker:
    levels: {speaker: int}
    columns:
      age: {scheme_id: age}
      gender: {scheme_id: gender}
      language: {scheme_id: language}
pdf: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.130.8506&rep=rep1&type=pdf

Inspect database tables

First check which tables are available.

list(db)
['emotion', 'files', 'speaker']

Then list the first 10 entries of every table.

db["files"].get()[:10]
speaker transcription
file
wav/03a01Fa.wav 3 a01
wav/03a01Nc.wav 3 a01
wav/03a01Wa.wav 3 a01
wav/03a02Fc.wav 3 a02
wav/03a02Nc.wav 3 a02
wav/03a02Ta.wav 3 a02
wav/03a02Wb.wav 3 a02
wav/03a02Wc.wav 3 a02
wav/03a04Ad.wav 3 a04
wav/03a04Fd.wav 3 a04
db["emotion"].get()[:10]
emotion emotion.confidence
file
wav/03a01Fa.wav happiness 0.90
wav/03a01Nc.wav neutral 1.00
wav/03a01Wa.wav anger 0.95
wav/03a02Fc.wav happiness 0.85
wav/03a02Nc.wav neutral 1.00
wav/03a02Ta.wav sadness 0.90
wav/03a02Wb.wav anger 1.00
wav/03a02Wc.wav anger 1.00
wav/03a04Ad.wav fear 0.90
wav/03a04Fd.wav happiness 1.00
db["speaker"].get()[:10]
age gender language
speaker
3 31 male deu
8 34 female deu
9 21 female deu
10 32 male deu
11 26 male deu
12 30 male deu
13 32 female deu
14 35 female deu
15 25 male deu
16 31 female deu

Columns might contain labels, that provide additional mappings. You can access this additional information with the map argument of audformat.Table.get(), see Map scheme labels for an extended documentation.

db["files"].get(map={"speaker": ["speaker", "age", "gender"]})[:10]
speaker transcription age gender
file
wav/03a01Fa.wav 3 a01 31 male
wav/03a01Nc.wav 3 a01 31 male
wav/03a01Wa.wav 3 a01 31 male
wav/03a02Fc.wav 3 a02 31 male
wav/03a02Nc.wav 3 a02 31 male
wav/03a02Ta.wav 3 a02 31 male
wav/03a02Wb.wav 3 a02 31 male
wav/03a02Wc.wav 3 a02 31 male
wav/03a04Ad.wav 3 a04 31 male
wav/03a04Fd.wav 3 a04 31 male

Store database to disk

Now we store the database in the folder emodb. Note, that we have to make sure that the media files are located at the correct position ourselves.

import shutil


db_dir = audeer.mkdir("emodb")
shutil.copytree(
    os.path.join(src_dir, "wav"),
    os.path.join(db_dir, "wav"),
)
db.save(db_dir)

os.listdir(db_dir)
['db.emotion.parquet',
 'db.files.parquet',
 'wav',
 'db.yaml',
 'db.speaker.parquet']

You can read the database from disk as well.

db = audformat.Database.load(db_dir)
db.name
'emodb'