Combine tables¶
It can happen that labels in your database are stored in tables of different type as some labels belong to the whole file, others don’t. The following examples highlights this with the labels for age and likability:
import audformat.testing
db = audformat.testing.create_db(minimal=True)
db.schemes["age"] = audformat.Scheme(
audformat.define.DataType.INTEGER,
minimum=20,
maximum=50,
)
db.schemes["likability"] = audformat.Scheme(
audformat.define.DataType.FLOAT,
)
audformat.testing.add_table(
db,
table_id="age",
index_type=audformat.define.IndexType.FILEWISE,
columns="age",
num_files=3,
)
audformat.testing.add_table(
db,
table_id="likability",
index_type=audformat.define.IndexType.SEGMENTED,
columns="likability",
num_files=4,
)
Which results in the following two pandas.DataFrame
:
display(
db["age"].get(),
db["likability"].get(),
)
age | |
---|---|
file | |
audio/001.wav | 42 |
audio/002.wav | 42 |
audio/003.wav | 38 |
likability | |||
---|---|---|---|
file | start | end | |
audio/001.wav | 0 days 00:00:00.076266121 | 0 days 00:00:00.619946911 | 0.435226 |
0 days 00:00:01.148784506 | 0 days 00:00:01.521320981 | 0.841527 | |
0 days 00:00:01.752849085 | 0 days 00:00:01.821368375 | 0.646402 | |
0 days 00:00:01.992789395 | 0 days 00:00:02.011086746 | 0.329573 | |
0 days 00:00:02.462067804 | 0 days 00:00:02.520139007 | 0.283158 | |
audio/002.wav | 0 days 00:00:00.414579600 | 0 days 00:00:00.942075097 | 0.481720 |
0 days 00:00:01.852233207 | 0 days 00:00:02.707899678 | 0.260380 | |
0 days 00:00:03.355573727 | 0 days 00:00:03.438177386 | 0.403139 | |
0 days 00:00:03.577501107 | 0 days 00:00:03.703908381 | 0.822152 | |
0 days 00:00:03.916778363 | 0 days 00:00:04.771936687 | 0.510847 | |
audio/003.wav | 0 days 00:00:01.388510311 | 0 days 00:00:01.639387281 | 0.666027 |
0 days 00:00:02.244054817 | 0 days 00:00:03.013083281 | 0.048390 | |
0 days 00:00:03.311775472 | 0 days 00:00:03.988677214 | 0.115562 | |
0 days 00:00:04.399860268 | 0 days 00:00:04.471247023 | 0.313857 | |
0 days 00:00:04.474487824 | 0 days 00:00:04.695643611 | 0.631005 | |
audio/004.wav | 0 days 00:00:00.588490047 | 0 days 00:00:01.148163469 | 0.529147 |
0 days 00:00:01.406743481 | 0 days 00:00:02.263371657 | 0.038501 | |
0 days 00:00:03.505024941 | 0 days 00:00:03.561733415 | 0.953864 | |
0 days 00:00:03.637172321 | 0 days 00:00:03.896031945 | 0.328594 | |
0 days 00:00:04.152167598 | 0 days 00:00:04.826367184 | 0.867975 |
You can simply combine both tables with:
combined_table = db["likability"] + db["age"]
Which results in the following pandas.DataFrame
:
combined_table.get()
likability | age | |||
---|---|---|---|---|
file | start | end | ||
audio/001.wav | 0 days 00:00:00.076266121 | 0 days 00:00:00.619946911 | 0.435226 | <NA> |
0 days 00:00:01.148784506 | 0 days 00:00:01.521320981 | 0.841527 | <NA> | |
0 days 00:00:01.752849085 | 0 days 00:00:01.821368375 | 0.646402 | <NA> | |
0 days 00:00:01.992789395 | 0 days 00:00:02.011086746 | 0.329573 | <NA> | |
0 days 00:00:02.462067804 | 0 days 00:00:02.520139007 | 0.283158 | <NA> | |
audio/002.wav | 0 days 00:00:00.414579600 | 0 days 00:00:00.942075097 | 0.481720 | <NA> |
0 days 00:00:01.852233207 | 0 days 00:00:02.707899678 | 0.260380 | <NA> | |
0 days 00:00:03.355573727 | 0 days 00:00:03.438177386 | 0.403139 | <NA> | |
0 days 00:00:03.577501107 | 0 days 00:00:03.703908381 | 0.822152 | <NA> | |
0 days 00:00:03.916778363 | 0 days 00:00:04.771936687 | 0.510847 | <NA> | |
audio/003.wav | 0 days 00:00:01.388510311 | 0 days 00:00:01.639387281 | 0.666027 | <NA> |
0 days 00:00:02.244054817 | 0 days 00:00:03.013083281 | 0.048390 | <NA> | |
0 days 00:00:03.311775472 | 0 days 00:00:03.988677214 | 0.115562 | <NA> | |
0 days 00:00:04.399860268 | 0 days 00:00:04.471247023 | 0.313857 | <NA> | |
0 days 00:00:04.474487824 | 0 days 00:00:04.695643611 | 0.631005 | <NA> | |
audio/004.wav | 0 days 00:00:00.588490047 | 0 days 00:00:01.148163469 | 0.529147 | <NA> |
0 days 00:00:01.406743481 | 0 days 00:00:02.263371657 | 0.038501 | <NA> | |
0 days 00:00:03.505024941 | 0 days 00:00:03.561733415 | 0.953864 | <NA> | |
0 days 00:00:03.637172321 | 0 days 00:00:03.896031945 | 0.328594 | <NA> | |
0 days 00:00:04.152167598 | 0 days 00:00:04.826367184 | 0.867975 | <NA> | |
audio/001.wav | 0 days 00:00:00 | NaT | NaN | 42 |
audio/002.wav | 0 days 00:00:00 | NaT | NaN | 42 |
audio/003.wav | 0 days 00:00:00 | NaT | NaN | 38 |
Or, if you just want to have the likability information for all segments, for which age information is available:
df_likability = db["likability"].get(
db["age"].files,
)
Which results in the following pandas.DataFrame
:
df_likability
likability | |||
---|---|---|---|
file | start | end | |
audio/001.wav | 0 days 00:00:00.076266121 | 0 days 00:00:00.619946911 | 0.435226 |
0 days 00:00:01.148784506 | 0 days 00:00:01.521320981 | 0.841527 | |
0 days 00:00:01.752849085 | 0 days 00:00:01.821368375 | 0.646402 | |
0 days 00:00:01.992789395 | 0 days 00:00:02.011086746 | 0.329573 | |
0 days 00:00:02.462067804 | 0 days 00:00:02.520139007 | 0.283158 | |
audio/002.wav | 0 days 00:00:00.414579600 | 0 days 00:00:00.942075097 | 0.481720 |
0 days 00:00:01.852233207 | 0 days 00:00:02.707899678 | 0.260380 | |
0 days 00:00:03.355573727 | 0 days 00:00:03.438177386 | 0.403139 | |
0 days 00:00:03.577501107 | 0 days 00:00:03.703908381 | 0.822152 | |
0 days 00:00:03.916778363 | 0 days 00:00:04.771936687 | 0.510847 | |
audio/003.wav | 0 days 00:00:01.388510311 | 0 days 00:00:01.639387281 | 0.666027 |
0 days 00:00:02.244054817 | 0 days 00:00:03.013083281 | 0.048390 | |
0 days 00:00:03.311775472 | 0 days 00:00:03.988677214 | 0.115562 | |
0 days 00:00:04.399860268 | 0 days 00:00:04.471247023 | 0.313857 | |
0 days 00:00:04.474487824 | 0 days 00:00:04.695643611 | 0.631005 |
Or, if you want to have the age information for segments in the likeability table:
df_age = db["age"].get(df_likability.index)
Which results in the following pandas.DataFrame
:
df_age
age | |||
---|---|---|---|
file | start | end | |
audio/001.wav | 0 days 00:00:00.076266121 | 0 days 00:00:00.619946911 | 42 |
0 days 00:00:01.148784506 | 0 days 00:00:01.521320981 | 42 | |
0 days 00:00:01.752849085 | 0 days 00:00:01.821368375 | 42 | |
0 days 00:00:01.992789395 | 0 days 00:00:02.011086746 | 42 | |
0 days 00:00:02.462067804 | 0 days 00:00:02.520139007 | 42 | |
audio/002.wav | 0 days 00:00:00.414579600 | 0 days 00:00:00.942075097 | 42 |
0 days 00:00:01.852233207 | 0 days 00:00:02.707899678 | 42 | |
0 days 00:00:03.355573727 | 0 days 00:00:03.438177386 | 42 | |
0 days 00:00:03.577501107 | 0 days 00:00:03.703908381 | 42 | |
0 days 00:00:03.916778363 | 0 days 00:00:04.771936687 | 42 | |
audio/003.wav | 0 days 00:00:01.388510311 | 0 days 00:00:01.639387281 | 38 |
0 days 00:00:02.244054817 | 0 days 00:00:03.013083281 | 38 | |
0 days 00:00:03.311775472 | 0 days 00:00:03.988677214 | 38 | |
0 days 00:00:04.399860268 | 0 days 00:00:04.471247023 | 38 | |
0 days 00:00:04.474487824 | 0 days 00:00:04.695643611 | 38 |
So far we have combined tables using the +
operator.
The result is a table that is no longer attached to a database.
That means that meta information about the media
or referenced schemes is discarded.
If you want to keep this information,
you can use audformat.Table.update()
,
which also works across databases,
as we will demonstrate with the following example.
First we create a second database and add a gender scheme:
db2 = audformat.testing.create_db(minimal=True)
db2.schemes["gender"] = audformat.Scheme(
labels=["female", "male"],
)
db2.schemes
gender:
dtype: str
labels: [female, male]
Next, we add a table and fill in some gender information:
audformat.testing.add_table(
db2,
table_id="gender_and_age",
index_type=audformat.define.IndexType.FILEWISE,
columns="gender",
num_files=[2, 3, 4],
).get()
gender | |
---|---|
file | |
audio/002.wav | female |
audio/003.wav | female |
audio/004.wav | male |
Now, we update the table with age values from the other database.
db2["gender_and_age"].update(db["age"]).get()
gender | age | |
---|---|---|
file | ||
audio/002.wav | female | 42 |
audio/003.wav | female | 38 |
audio/004.wav | male | <NA> |
audio/001.wav | NaN | 42 |
And also copies the according scheme to the database:
db2.schemes
age:
{dtype: int, minimum: 20, maximum: 50}
gender:
dtype: str
labels: [female, male]