Map scheme labelsΒΆ
The labels
attribute of schemes can be used to
encode additional information about the table data.
In the following example we have a scheme
"transcription"
that maps IDs to words.
And a scheme "speaker"
that holds gender and age
information about the speakers in the database.
import audformat.testing
db = audformat.testing.create_db(minimal=True)
db.schemes["transcription"] = audformat.Scheme(
labels={
0: "hello",
1: "goodbye",
}
)
db.schemes["speaker"] = audformat.Scheme(
labels={
"spk1": {
"gender": "male",
"age": 33,
},
"spk2": {
"gender": "female",
"age": 30,
},
"spk3": {
"gender": "male",
"age": 37,
},
}
)
audformat.testing.add_table(
db,
"files",
audformat.define.IndexType.FILEWISE,
)
If we request the transcription
column,
we get a pandas.Series
with the word IDs:
db["files"]["transcription"].get()
transcription | |
---|---|
file | |
audio/001.wav | 0 |
audio/002.wav | 1 |
audio/003.wav | 0 |
audio/004.wav | 0 |
audio/005.wav | 0 |
But if we are interested in the actual transcribed words,
we can use the map
argument to request them.
db["files"]["transcription"].get(map="transcription")
transcription | |
---|---|
file | |
audio/001.wav | hello |
audio/002.wav | goodbye |
audio/003.wav | hello |
audio/004.wav | hello |
audio/005.wav | hello |
Note that we can pass any string to map
.
It will be used as the name of
the returned pandas.Series
.
db["files"]["transcription"].get(map="word")
word | |
---|---|
file | |
audio/001.wav | hello |
audio/002.wav | goodbye |
audio/003.wav | hello |
audio/004.wav | hello |
audio/005.wav | hello |
Likewise, if we request the speaker column, a list of names is returned:
db["files"]["speaker"].get()
speaker | |
---|---|
file | |
audio/001.wav | spk3 |
audio/002.wav | spk1 |
audio/003.wav | spk3 |
audio/004.wav | spk2 |
audio/005.wav | spk1 |
If we are interested in the age of the speakers, we can do:
db["files"]["speaker"].get(map="age")
age | |
---|---|
file | |
audio/001.wav | 37 |
audio/002.wav | 33 |
audio/003.wav | 37 |
audio/004.wav | 30 |
audio/005.wav | 33 |
This also works for tables. Here we pass a dictionary with column names as keys and scheme fields as values.
map = {
"speaker": "age",
}
db["files"].get(map=map)
transcription | age | |
---|---|---|
file | ||
audio/001.wav | 0 | 37 |
audio/002.wav | 1 | 33 |
audio/003.wav | 0 | 37 |
audio/004.wav | 0 | 30 |
audio/005.wav | 0 | 33 |
It is possible to map several columns at once and to map the same column to multiple fields.
map = {
"transcription": "words",
"speaker": ["age", "gender"],
}
db["files"].get(map=map)
words | age | gender | |
---|---|---|---|
file | |||
audio/001.wav | hello | 37 | male |
audio/002.wav | goodbye | 33 | male |
audio/003.wav | hello | 37 | male |
audio/004.wav | hello | 30 | female |
audio/005.wav | hello | 33 | male |
To keep the original columns values, we can include the column name in the list.
map = {
"transcription": ["transcription", "words"],
"speaker": ["speaker", "age", "gender"],
}
db["files"].get(map=map)
speaker | transcription | words | age | gender | |
---|---|---|---|---|---|
file | |||||
audio/001.wav | spk3 | 0 | hello | 37 | male |
audio/002.wav | spk1 | 1 | goodbye | 33 | male |
audio/003.wav | spk3 | 0 | hello | 37 | male |
audio/004.wav | spk2 | 0 | hello | 30 | female |
audio/005.wav | spk1 | 0 | hello | 33 | male |