Map scheme labelsΒΆ

The labels attribute of schemes can be used to encode additional information about the table data. In the following example we have a scheme "transcription" that maps IDs to words. And a scheme "speaker" that holds gender and age information about the speakers in the database.

import audformat.testing

db = audformat.testing.create_db(minimal=True)
db.schemes["transcription"] = audformat.Scheme(
    labels={
        0: "hello",
        1: "goodbye",
    }
)
db.schemes["speaker"] = audformat.Scheme(
    labels={
        "spk1": {
            "gender": "male",
            "age": 33,
        },
        "spk2": {
            "gender": "female",
            "age": 30,
        },
        "spk3": {
            "gender": "male",
            "age": 37,
        },
    }
)
audformat.testing.add_table(
    db,
    "files",
    audformat.define.IndexType.FILEWISE,
)

If we request the transcription column, we get a pandas.Series with the word IDs:

db["files"]["transcription"].get()
transcription
file
audio/001.wav 1
audio/002.wav 0
audio/003.wav 0
audio/004.wav 0
audio/005.wav 1

But if we are interested in the actual transcribed words, we can use the map argument to request them.

db["files"]["transcription"].get(map="transcription")
transcription
file
audio/001.wav goodbye
audio/002.wav hello
audio/003.wav hello
audio/004.wav hello
audio/005.wav goodbye

Note that we can pass any string to map. It will be used as the name of the returned pandas.Series.

db["files"]["transcription"].get(map="word")
word
file
audio/001.wav goodbye
audio/002.wav hello
audio/003.wav hello
audio/004.wav hello
audio/005.wav goodbye

Likewise, if we request the speaker column, a list of names is returned:

db["files"]["speaker"].get()
speaker
file
audio/001.wav spk2
audio/002.wav spk1
audio/003.wav spk1
audio/004.wav spk3
audio/005.wav spk3

If we are interested in the age of the speakers, we can do:

db["files"]["speaker"].get(map="age")
age
file
audio/001.wav 30
audio/002.wav 33
audio/003.wav 33
audio/004.wav 37
audio/005.wav 37

This also works for tables. Here we pass a dictionary with column names as keys and scheme fields as values.

map = {
    "speaker": "age",
}
db["files"].get(map=map)
transcription age
file
audio/001.wav 1 30
audio/002.wav 0 33
audio/003.wav 0 33
audio/004.wav 0 37
audio/005.wav 1 37

It is possible to map several columns at once and to map the same column to multiple fields.

map = {
    "transcription": "words",
    "speaker": ["age", "gender"],
}
db["files"].get(map=map)
words age gender
file
audio/001.wav goodbye 30 female
audio/002.wav hello 33 male
audio/003.wav hello 33 male
audio/004.wav hello 37 male
audio/005.wav goodbye 37 male

To keep the original columns values, we can include the column name in the list.

map = {
    "transcription": ["transcription", "words"],
    "speaker": ["speaker", "age", "gender"],
}
db["files"].get(map=map)
speaker transcription words age gender
file
audio/001.wav spk2 1 goodbye 30 female
audio/002.wav spk1 0 hello 33 male
audio/003.wav spk1 0 hello 33 male
audio/004.wav spk3 0 hello 37 male
audio/005.wav spk3 1 goodbye 37 male