Combine tables

It can happen that labels in your database are stored in tables of different type as some labels belong to the whole file, others don’t. The following examples highlights this with the labels for age and likability:

import audformat.testing


db = audformat.testing.create_db(minimal=True)
db.schemes["age"] = audformat.Scheme(
    audformat.define.DataType.INTEGER,
    minimum=20,
    maximum=50,
)
db.schemes["likability"] = audformat.Scheme(
    audformat.define.DataType.FLOAT,
)
audformat.testing.add_table(
    db,
    table_id="age",
    index_type=audformat.define.IndexType.FILEWISE,
    columns="age",
    num_files=3,
)
audformat.testing.add_table(
    db,
    table_id="likability",
    index_type=audformat.define.IndexType.SEGMENTED,
    columns="likability",
    num_files=4,
)

Which results in the following two pandas.DataFrame:

display(
    db["age"].get(),
    db["likability"].get(),
)
age
file
audio/001.wav 36
audio/002.wav 40
audio/003.wav 30
likability
file start end
audio/001.wav 0 days 00:00:00.337746390 0 days 00:00:00.410186556 0.227862
0 days 00:00:00.545479130 0 days 00:00:01.314332376 0.035046
0 days 00:00:01.512813203 0 days 00:00:01.535342387 0.525103
0 days 00:00:02.943708820 0 days 00:00:03.981677970 0.828315
0 days 00:00:04.134205982 0 days 00:00:04.301267746 0.165376
audio/002.wav 0 days 00:00:00.091123109 0 days 00:00:00.453853725 0.962539
0 days 00:00:00.578407697 0 days 00:00:00.907991479 0.846079
0 days 00:00:00.959012205 0 days 00:00:01.989334879 0.893106
0 days 00:00:02.567033249 0 days 00:00:03.277662717 0.405539
0 days 00:00:04.142488626 0 days 00:00:04.404059534 0.614375
audio/003.wav 0 days 00:00:00.791016100 0 days 00:00:00.881004749 0.275519
0 days 00:00:01.218236263 0 days 00:00:01.610193716 0.461150
0 days 00:00:01.754876925 0 days 00:00:03.119232243 0.085179
0 days 00:00:04.126523482 0 days 00:00:04.529768315 0.955684
0 days 00:00:04.541465861 0 days 00:00:04.935519102 0.144458
audio/004.wav 0 days 00:00:00.045465117 0 days 00:00:00.143033897 0.604324
0 days 00:00:01.111629009 0 days 00:00:01.226279413 0.702993
0 days 00:00:01.694471491 0 days 00:00:03.056140873 0.830441
0 days 00:00:03.171696339 0 days 00:00:03.276904559 0.408652
0 days 00:00:03.551005838 0 days 00:00:03.766852166 0.141132

You can simply combine both tables with:

combined_table = db["likability"] + db["age"]

Which results in the following pandas.DataFrame:

combined_table.get()
likability age
file start end
audio/001.wav 0 days 00:00:00.337746390 0 days 00:00:00.410186556 0.227862 <NA>
0 days 00:00:00.545479130 0 days 00:00:01.314332376 0.035046 <NA>
0 days 00:00:01.512813203 0 days 00:00:01.535342387 0.525103 <NA>
0 days 00:00:02.943708820 0 days 00:00:03.981677970 0.828315 <NA>
0 days 00:00:04.134205982 0 days 00:00:04.301267746 0.165376 <NA>
audio/002.wav 0 days 00:00:00.091123109 0 days 00:00:00.453853725 0.962539 <NA>
0 days 00:00:00.578407697 0 days 00:00:00.907991479 0.846079 <NA>
0 days 00:00:00.959012205 0 days 00:00:01.989334879 0.893106 <NA>
0 days 00:00:02.567033249 0 days 00:00:03.277662717 0.405539 <NA>
0 days 00:00:04.142488626 0 days 00:00:04.404059534 0.614375 <NA>
audio/003.wav 0 days 00:00:00.791016100 0 days 00:00:00.881004749 0.275519 <NA>
0 days 00:00:01.218236263 0 days 00:00:01.610193716 0.461150 <NA>
0 days 00:00:01.754876925 0 days 00:00:03.119232243 0.085179 <NA>
0 days 00:00:04.126523482 0 days 00:00:04.529768315 0.955684 <NA>
0 days 00:00:04.541465861 0 days 00:00:04.935519102 0.144458 <NA>
audio/004.wav 0 days 00:00:00.045465117 0 days 00:00:00.143033897 0.604324 <NA>
0 days 00:00:01.111629009 0 days 00:00:01.226279413 0.702993 <NA>
0 days 00:00:01.694471491 0 days 00:00:03.056140873 0.830441 <NA>
0 days 00:00:03.171696339 0 days 00:00:03.276904559 0.408652 <NA>
0 days 00:00:03.551005838 0 days 00:00:03.766852166 0.141132 <NA>
audio/001.wav 0 days 00:00:00 NaT NaN 36
audio/002.wav 0 days 00:00:00 NaT NaN 40
audio/003.wav 0 days 00:00:00 NaT NaN 30

Or, if you just want to have the likability information for all segments, for which age information is available:

df_likability = db["likability"].get(
    db["age"].files,
)

Which results in the following pandas.DataFrame:

df_likability
likability
file start end
audio/001.wav 0 days 00:00:00.337746390 0 days 00:00:00.410186556 0.227862
0 days 00:00:00.545479130 0 days 00:00:01.314332376 0.035046
0 days 00:00:01.512813203 0 days 00:00:01.535342387 0.525103
0 days 00:00:02.943708820 0 days 00:00:03.981677970 0.828315
0 days 00:00:04.134205982 0 days 00:00:04.301267746 0.165376
audio/002.wav 0 days 00:00:00.091123109 0 days 00:00:00.453853725 0.962539
0 days 00:00:00.578407697 0 days 00:00:00.907991479 0.846079
0 days 00:00:00.959012205 0 days 00:00:01.989334879 0.893106
0 days 00:00:02.567033249 0 days 00:00:03.277662717 0.405539
0 days 00:00:04.142488626 0 days 00:00:04.404059534 0.614375
audio/003.wav 0 days 00:00:00.791016100 0 days 00:00:00.881004749 0.275519
0 days 00:00:01.218236263 0 days 00:00:01.610193716 0.461150
0 days 00:00:01.754876925 0 days 00:00:03.119232243 0.085179
0 days 00:00:04.126523482 0 days 00:00:04.529768315 0.955684
0 days 00:00:04.541465861 0 days 00:00:04.935519102 0.144458

Or, if you want to have the age information for segments in the likeability table:

df_age = db["age"].get(df_likability.index)

Which results in the following pandas.DataFrame:

df_age
age
file start end
audio/001.wav 0 days 00:00:00.337746390 0 days 00:00:00.410186556 36
0 days 00:00:00.545479130 0 days 00:00:01.314332376 36
0 days 00:00:01.512813203 0 days 00:00:01.535342387 36
0 days 00:00:02.943708820 0 days 00:00:03.981677970 36
0 days 00:00:04.134205982 0 days 00:00:04.301267746 36
audio/002.wav 0 days 00:00:00.091123109 0 days 00:00:00.453853725 40
0 days 00:00:00.578407697 0 days 00:00:00.907991479 40
0 days 00:00:00.959012205 0 days 00:00:01.989334879 40
0 days 00:00:02.567033249 0 days 00:00:03.277662717 40
0 days 00:00:04.142488626 0 days 00:00:04.404059534 40
audio/003.wav 0 days 00:00:00.791016100 0 days 00:00:00.881004749 30
0 days 00:00:01.218236263 0 days 00:00:01.610193716 30
0 days 00:00:01.754876925 0 days 00:00:03.119232243 30
0 days 00:00:04.126523482 0 days 00:00:04.529768315 30
0 days 00:00:04.541465861 0 days 00:00:04.935519102 30

So far we have combined tables using the + operator. The result is a table that is no longer attached to a database. That means that meta information about the media or referenced schemes is discarded. If you want to keep this information, you can use audformat.Table.update(), which also works across databases, as we will demonstrate with the following example.

First we create a second database and add a gender scheme:

db2 = audformat.testing.create_db(minimal=True)
db2.schemes["gender"] = audformat.Scheme(
    labels=["female", "male"],
)
db2.schemes
gender:
  dtype: str
  labels: [female, male]

Next, we add a table and fill in some gender information:

audformat.testing.add_table(
    db2,
    table_id="gender_and_age",
    index_type=audformat.define.IndexType.FILEWISE,
    columns="gender",
    num_files=[2, 3, 4],
).get()
gender
file
audio/002.wav male
audio/003.wav male
audio/004.wav male

Now, we update the table with age values from the other database.

db2["gender_and_age"].update(db["age"]).get()
gender age
file
audio/002.wav male 40
audio/003.wav male 30
audio/004.wav male <NA>
audio/001.wav NaN 36

And also copies the according scheme to the database:

db2.schemes
age:
  {dtype: int, minimum: 20, maximum: 50}
gender:
  dtype: str
  labels: [female, male]