Map scheme labelsΒΆ

The labels attribute of schemes can be used to encode additional information about the table data. In the following example we have a scheme 'transcription' that maps IDs to words. And a scheme 'speaker' that holds gender and age information about the speakers in the database.

import audformat.testing

db = audformat.testing.create_db(minimal=True)
db.schemes['transcription'] = audformat.Scheme(
    labels={
        0: 'hello',
        1: 'goodbye',
    }
)
db.schemes['speaker'] = audformat.Scheme(
    labels={
        'spk1': {
            'gender': 'male',
            'age': 33,
        },
        'spk2': {
            'gender': 'female',
            'age': 30,
        },
        'spk3': {
            'gender': 'male',
            'age': 37,
        },
    }
)
audformat.testing.add_table(
    db,
    'files',
    audformat.define.IndexType.FILEWISE,
)

If we request the transcription column, we get a pandas.Series with the word IDs:

db['files']['transcription'].get()
transcription
file
audio/001.wav 0
audio/002.wav 1
audio/003.wav 1
audio/004.wav 1
audio/005.wav 1

But if we are interested in the actual transcribed words, we can use the map argument to request them.

db['files']['transcription'].get(map='transcription')
transcription
file
audio/001.wav hello
audio/002.wav goodbye
audio/003.wav goodbye
audio/004.wav goodbye
audio/005.wav goodbye

Note that we can pass any string to map. It will be used as the name of the returned pandas.Series.

db['files']['transcription'].get(map='word')
word
file
audio/001.wav hello
audio/002.wav goodbye
audio/003.wav goodbye
audio/004.wav goodbye
audio/005.wav goodbye

Likewise, if we request the speaker column, a list of names is returned:

db['files']['speaker'].get()
speaker
file
audio/001.wav spk1
audio/002.wav spk3
audio/003.wav spk2
audio/004.wav spk3
audio/005.wav spk2

If we are interested in the the age of the speakers, we can do:

db['files']['speaker'].get(map='age')
age
file
audio/001.wav 33
audio/002.wav 37
audio/003.wav 30
audio/004.wav 37
audio/005.wav 30

This also works for tables. Here we pass a dictionary with column names as keys and scheme fields as values.

map = {
    'speaker': 'age',
}
db['files'].get(map=map)
transcription age
file
audio/001.wav 0 33
audio/002.wav 1 37
audio/003.wav 1 30
audio/004.wav 1 37
audio/005.wav 1 30

It is possible to map several columns at once and to map the same column to multiple fields.

map = {
    'transcription': 'words',
    'speaker': ['age', 'gender'],
}
db['files'].get(map=map)
words age gender
file
audio/001.wav hello 33 male
audio/002.wav goodbye 37 male
audio/003.wav goodbye 30 female
audio/004.wav goodbye 37 male
audio/005.wav goodbye 30 female

To keep the original columns values, we can include the column name in the list.

map = {
    'transcription': ['transcription', 'words'],
    'speaker': ['speaker', 'age', 'gender'],
}
db['files'].get(map=map)
transcription speaker words age gender
file
audio/001.wav 0 spk1 hello 33 male
audio/002.wav 1 spk3 goodbye 37 male
audio/003.wav 1 spk2 goodbye 30 female
audio/004.wav 1 spk3 goodbye 37 male
audio/005.wav 1 spk2 goodbye 30 female