Combine tables

It can happen that labels in your database are stored in tables of different type as some labels belong to the whole file, others don’t. The following examples highlights this with the labels for age and likability:

import audformat.testing


db = audformat.testing.create_db(minimal=True)
db.schemes['age'] = audformat.Scheme(
    audformat.define.DataType.INTEGER,
    minimum=20,
    maximum=50,
)
db.schemes['likability'] = audformat.Scheme(
    audformat.define.DataType.FLOAT,
)
audformat.testing.add_table(
    db,
    table_id='age',
    index_type=audformat.define.IndexType.FILEWISE,
    columns='age',
    num_files=3,
)
audformat.testing.add_table(
    db,
    table_id='likability',
    index_type=audformat.define.IndexType.SEGMENTED,
    columns='likability',
    num_files=4,
)

Which results in the following two pandas.DataFrame:

display(
    db['age'].get(),
    db['likability'].get(),
)
age
file
audio/001.wav 46
audio/002.wav 37
audio/003.wav 38
likability
file start end
audio/001.wav 0 days 00:00:00.397213280 0 days 00:00:00.596936965 0.914114
0 days 00:00:02.327927923 0 days 00:00:02.622891145 0.726737
0 days 00:00:02.983916819 0 days 00:00:03.272334268 0.106575
0 days 00:00:03.374808379 0 days 00:00:03.568238552 0.377431
0 days 00:00:03.572420733 0 days 00:00:04.662056847 0.529905
audio/002.wav 0 days 00:00:00.072218798 0 days 00:00:00.839045920 0.030434
0 days 00:00:02.089791008 0 days 00:00:02.648382075 0.904820
0 days 00:00:02.960316741 0 days 00:00:03.702123706 0.800830
0 days 00:00:03.968990568 0 days 00:00:03.988358850 0.998612
0 days 00:00:04.121821040 0 days 00:00:04.526335545 0.654083
audio/003.wav 0 days 00:00:00.507197205 0 days 00:00:00.679092989 0.607141
0 days 00:00:00.954117076 0 days 00:00:01.011094852 0.294764
0 days 00:00:01.066596831 0 days 00:00:01.491808778 0.458248
0 days 00:00:02.772785231 0 days 00:00:02.832978753 0.858138
0 days 00:00:03.559892297 0 days 00:00:03.815023972 0.978649
audio/004.wav 0 days 00:00:00.038268510 0 days 00:00:00.083987694 0.325549
0 days 00:00:00.831216573 0 days 00:00:00.939269567 0.479070
0 days 00:00:01.167002008 0 days 00:00:02.176834738 0.230441
0 days 00:00:03.235194973 0 days 00:00:03.488282216 0.594809
0 days 00:00:03.894691370 0 days 00:00:04.485285859 0.378801

You can simply combine both tables with:

combined_table = db['likability'] + db['age']

Which results in the following pandas.DataFrame:

combined_table.get()
likability age
file start end
audio/001.wav 0 days 00:00:00 NaT NaN 46
0 days 00:00:00.397213280 0 days 00:00:00.596936965 0.914114 <NA>
0 days 00:00:02.327927923 0 days 00:00:02.622891145 0.726737 <NA>
0 days 00:00:02.983916819 0 days 00:00:03.272334268 0.106575 <NA>
0 days 00:00:03.374808379 0 days 00:00:03.568238552 0.377431 <NA>
0 days 00:00:03.572420733 0 days 00:00:04.662056847 0.529905 <NA>
audio/002.wav 0 days 00:00:00 NaT NaN 37
0 days 00:00:00.072218798 0 days 00:00:00.839045920 0.030434 <NA>
0 days 00:00:02.089791008 0 days 00:00:02.648382075 0.904820 <NA>
0 days 00:00:02.960316741 0 days 00:00:03.702123706 0.800830 <NA>
0 days 00:00:03.968990568 0 days 00:00:03.988358850 0.998612 <NA>
0 days 00:00:04.121821040 0 days 00:00:04.526335545 0.654083 <NA>
audio/003.wav 0 days 00:00:00 NaT NaN 38
0 days 00:00:00.507197205 0 days 00:00:00.679092989 0.607141 <NA>
0 days 00:00:00.954117076 0 days 00:00:01.011094852 0.294764 <NA>
0 days 00:00:01.066596831 0 days 00:00:01.491808778 0.458248 <NA>
0 days 00:00:02.772785231 0 days 00:00:02.832978753 0.858138 <NA>
0 days 00:00:03.559892297 0 days 00:00:03.815023972 0.978649 <NA>
audio/004.wav 0 days 00:00:00.038268510 0 days 00:00:00.083987694 0.325549 <NA>
0 days 00:00:00.831216573 0 days 00:00:00.939269567 0.479070 <NA>
0 days 00:00:01.167002008 0 days 00:00:02.176834738 0.230441 <NA>
0 days 00:00:03.235194973 0 days 00:00:03.488282216 0.594809 <NA>
0 days 00:00:03.894691370 0 days 00:00:04.485285859 0.378801 <NA>

Or, if you just want to have the likability information for all segments, for which age information is available:

df_likability = db['likability'].get(
    db['age'].files,
)

Which results in the following pandas.DataFrame:

df_likability
likability
file start end
audio/001.wav 0 days 00:00:00.397213280 0 days 00:00:00.596936965 0.914114
0 days 00:00:02.327927923 0 days 00:00:02.622891145 0.726737
0 days 00:00:02.983916819 0 days 00:00:03.272334268 0.106575
0 days 00:00:03.374808379 0 days 00:00:03.568238552 0.377431
0 days 00:00:03.572420733 0 days 00:00:04.662056847 0.529905
audio/002.wav 0 days 00:00:00.072218798 0 days 00:00:00.839045920 0.030434
0 days 00:00:02.089791008 0 days 00:00:02.648382075 0.904820
0 days 00:00:02.960316741 0 days 00:00:03.702123706 0.800830
0 days 00:00:03.968990568 0 days 00:00:03.988358850 0.998612
0 days 00:00:04.121821040 0 days 00:00:04.526335545 0.654083
audio/003.wav 0 days 00:00:00.507197205 0 days 00:00:00.679092989 0.607141
0 days 00:00:00.954117076 0 days 00:00:01.011094852 0.294764
0 days 00:00:01.066596831 0 days 00:00:01.491808778 0.458248
0 days 00:00:02.772785231 0 days 00:00:02.832978753 0.858138
0 days 00:00:03.559892297 0 days 00:00:03.815023972 0.978649

Or, if you want to have the age information for segments in the likeability table:

df_age = db['age'].get(df_likability.index)

Which results in the following pandas.DataFrame:

df_age
age
file start end
audio/001.wav 0 days 00:00:00.397213280 0 days 00:00:00.596936965 46
0 days 00:00:02.327927923 0 days 00:00:02.622891145 46
0 days 00:00:02.983916819 0 days 00:00:03.272334268 46
0 days 00:00:03.374808379 0 days 00:00:03.568238552 46
0 days 00:00:03.572420733 0 days 00:00:04.662056847 46
audio/002.wav 0 days 00:00:00.072218798 0 days 00:00:00.839045920 37
0 days 00:00:02.089791008 0 days 00:00:02.648382075 37
0 days 00:00:02.960316741 0 days 00:00:03.702123706 37
0 days 00:00:03.968990568 0 days 00:00:03.988358850 37
0 days 00:00:04.121821040 0 days 00:00:04.526335545 37
audio/003.wav 0 days 00:00:00.507197205 0 days 00:00:00.679092989 38
0 days 00:00:00.954117076 0 days 00:00:01.011094852 38
0 days 00:00:01.066596831 0 days 00:00:01.491808778 38
0 days 00:00:02.772785231 0 days 00:00:02.832978753 38
0 days 00:00:03.559892297 0 days 00:00:03.815023972 38

So far we have combined tables using the + operator. The result is a table that is no longer attached to a database. That means that meta information about the media or referenced schemes is discarded. If you want to keep this information, you can use audformat.Table.update(), which also works across databases, as we will demonstrate with the following example.

First we create a second database and add a gender scheme:

db2 = audformat.testing.create_db(minimal=True)
db2.schemes['gender'] = audformat.Scheme(
    labels=['female', 'male'],
)
db2.schemes
gender:
  dtype: str
  labels: [female, male]

Next, we add a table and fill in some gender information:

audformat.testing.add_table(
    db2,
    table_id='gender_and_age',
    index_type=audformat.define.IndexType.FILEWISE,
    columns='gender',
    num_files=[2, 3, 4],
).get()
gender
file
audio/002.wav female
audio/003.wav female
audio/004.wav male

Now, we update the table with age values from the other database.

db2['gender_and_age'].update(db['age']).get()
gender age
file
audio/001.wav NaN 46
audio/002.wav female 37
audio/003.wav female 38
audio/004.wav male <NA>

And also copies the according scheme to the database:

db2.schemes
gender:
  dtype: str
  labels: [female, male]
age:
  {dtype: int, minimum: 20, maximum: 50}