Combine tables

It can happen that labels in your database are stored in tables of different type as some labels belong to the whole file, others don’t. The following examples highlights this with the labels for age and likability:

import audformat.testing

db = audformat.testing.create_db(minimal=True)
db.schemes["age"] = audformat.Scheme(
    audformat.define.DataType.INTEGER,
    minimum=20,
    maximum=50,
)
db.schemes["likability"] = audformat.Scheme(
    audformat.define.DataType.FLOAT,
)
audformat.testing.add_table(
    db,
    table_id="age",
    index_type=audformat.define.IndexType.FILEWISE,
    columns="age",
    num_files=3,
)
audformat.testing.add_table(
    db,
    table_id="likability",
    index_type=audformat.define.IndexType.SEGMENTED,
    columns="likability",
    num_files=4,
)

Which results in the following two pandas.DataFrame:

>>> db["age"].get()
               age
file
audio/001.wav   40
audio/002.wav   33
audio/003.wav   38
>>> db["likability"].get()
                                                                   likability
file          start                     end
audio/001.wav 0 days 00:00:00.063022023 0 days 00:00:00.670806417    0.310583
              0 days 00:00:02.176912810 0 days 00:00:02.570855787    0.047821
              0 days 00:00:03.254253920 0 days 00:00:03.748281529    0.288177
              0 days 00:00:03.959315192 0 days 00:00:04.110064420    0.848677
              0 days 00:00:04.541308146 0 days 00:00:04.962982764    0.787870
audio/002.wav 0 days 00:00:00.003443562 0 days 00:00:00.037705840    0.122311
              0 days 00:00:00.796582868 0 days 00:00:01.075538128    0.646814
              0 days 00:00:01.110602019 0 days 00:00:01.439743769    0.837227
              0 days 00:00:02.194859538 0 days 00:00:03.675142916    0.430492
              0 days 00:00:04.187563347 0 days 00:00:04.738401164    0.944455
audio/003.wav 0 days 00:00:00.574500931 0 days 00:00:00.929483764    0.422895
              0 days 00:00:01.450303730 0 days 00:00:01.953211769    0.559592
              0 days 00:00:02.808842477 0 days 00:00:03.099035455    0.586085
              0 days 00:00:03.225926686 0 days 00:00:04.007331084    0.701001
              0 days 00:00:04.015322811 0 days 00:00:04.430922264    0.492587
audio/004.wav 0 days 00:00:00.276361409 0 days 00:00:00.625730112    0.717525
              0 days 00:00:01.335888842 0 days 00:00:01.920178620    0.705011
              0 days 00:00:02.320023510 0 days 00:00:02.419307606    0.607638
              0 days 00:00:02.798096347 0 days 00:00:02.892323558    0.918130
              0 days 00:00:03.617486299 0 days 00:00:03.943024074    0.414640

You can simply combine both tables with:

>>> combined_table = db["likability"] + db["age"]
>>> combined_table.get()
                                                                   likability   age
file          start                     end
audio/001.wav 0 days 00:00:00.063022023 0 days 00:00:00.670806417    0.310583  <NA>
              0 days 00:00:02.176912810 0 days 00:00:02.570855787    0.047821  <NA>
              0 days 00:00:03.254253920 0 days 00:00:03.748281529    0.288177  <NA>
              0 days 00:00:03.959315192 0 days 00:00:04.110064420    0.848677  <NA>
              0 days 00:00:04.541308146 0 days 00:00:04.962982764    0.787870  <NA>
audio/002.wav 0 days 00:00:00.003443562 0 days 00:00:00.037705840    0.122311  <NA>
              0 days 00:00:00.796582868 0 days 00:00:01.075538128    0.646814  <NA>
              0 days 00:00:01.110602019 0 days 00:00:01.439743769    0.837227  <NA>
              0 days 00:00:02.194859538 0 days 00:00:03.675142916    0.430492  <NA>
              0 days 00:00:04.187563347 0 days 00:00:04.738401164    0.944455  <NA>
audio/003.wav 0 days 00:00:00.574500931 0 days 00:00:00.929483764    0.422895  <NA>
              0 days 00:00:01.450303730 0 days 00:00:01.953211769    0.559592  <NA>
              0 days 00:00:02.808842477 0 days 00:00:03.099035455    0.586085  <NA>
              0 days 00:00:03.225926686 0 days 00:00:04.007331084    0.701001  <NA>
              0 days 00:00:04.015322811 0 days 00:00:04.430922264    0.492587  <NA>
audio/004.wav 0 days 00:00:00.276361409 0 days 00:00:00.625730112    0.717525  <NA>
              0 days 00:00:01.335888842 0 days 00:00:01.920178620    0.705011  <NA>
              0 days 00:00:02.320023510 0 days 00:00:02.419307606    0.607638  <NA>
              0 days 00:00:02.798096347 0 days 00:00:02.892323558    0.918130  <NA>
              0 days 00:00:03.617486299 0 days 00:00:03.943024074    0.414640  <NA>
audio/001.wav 0 days 00:00:00           NaT                               NaN    40
audio/002.wav 0 days 00:00:00           NaT                               NaN    33
audio/003.wav 0 days 00:00:00           NaT                               NaN    38

Or, if you just want to have the likability information for all segments, for which age information is available:

>>> df_likability = db["likability"].get(db["age"].files)
>>> df_likability
                                                                   likability
file          start                     end
audio/001.wav 0 days 00:00:00.063022023 0 days 00:00:00.670806417    0.310583
              0 days 00:00:02.176912810 0 days 00:00:02.570855787    0.047821
              0 days 00:00:03.254253920 0 days 00:00:03.748281529    0.288177
              0 days 00:00:03.959315192 0 days 00:00:04.110064420    0.848677
              0 days 00:00:04.541308146 0 days 00:00:04.962982764    0.787870
audio/002.wav 0 days 00:00:00.003443562 0 days 00:00:00.037705840    0.122311
              0 days 00:00:00.796582868 0 days 00:00:01.075538128    0.646814
              0 days 00:00:01.110602019 0 days 00:00:01.439743769    0.837227
              0 days 00:00:02.194859538 0 days 00:00:03.675142916    0.430492
              0 days 00:00:04.187563347 0 days 00:00:04.738401164    0.944455
audio/003.wav 0 days 00:00:00.574500931 0 days 00:00:00.929483764    0.422895
              0 days 00:00:01.450303730 0 days 00:00:01.953211769    0.559592
              0 days 00:00:02.808842477 0 days 00:00:03.099035455    0.586085
              0 days 00:00:03.225926686 0 days 00:00:04.007331084    0.701001
              0 days 00:00:04.015322811 0 days 00:00:04.430922264    0.492587

Or, if you want to have the age information for segments in the likeability table:

>>> db["age"].get(df_likability.index)
                                                                   age
file          start                     end
audio/001.wav 0 days 00:00:00.063022023 0 days 00:00:00.670806417   40
              0 days 00:00:02.176912810 0 days 00:00:02.570855787   40
              0 days 00:00:03.254253920 0 days 00:00:03.748281529   40
              0 days 00:00:03.959315192 0 days 00:00:04.110064420   40
              0 days 00:00:04.541308146 0 days 00:00:04.962982764   40
audio/002.wav 0 days 00:00:00.003443562 0 days 00:00:00.037705840   33
              0 days 00:00:00.796582868 0 days 00:00:01.075538128   33
              0 days 00:00:01.110602019 0 days 00:00:01.439743769   33
              0 days 00:00:02.194859538 0 days 00:00:03.675142916   33
              0 days 00:00:04.187563347 0 days 00:00:04.738401164   33
audio/003.wav 0 days 00:00:00.574500931 0 days 00:00:00.929483764   38
              0 days 00:00:01.450303730 0 days 00:00:01.953211769   38
              0 days 00:00:02.808842477 0 days 00:00:03.099035455   38
              0 days 00:00:03.225926686 0 days 00:00:04.007331084   38
              0 days 00:00:04.015322811 0 days 00:00:04.430922264   38

So far we have combined tables using the + operator. The result is a table that is no longer attached to a database. That means that meta information about the media or referenced schemes is discarded. If you want to keep this information, you can use audformat.Table.update(), which also works across databases, as we will demonstrate with the following example.

First we create a second database and add a gender scheme:

>>> db2 = audformat.testing.create_db(minimal=True)
>>> db2.schemes["gender"] = audformat.Scheme(labels=["female", "male"])
>>> db2.schemes
gender:
  dtype: str
  labels: [female, male]

Next, we add a table and fill in some gender information:

>>> audformat.testing.add_table(
...     db2,
...     table_id="gender_and_age",
...     index_type=audformat.define.IndexType.FILEWISE,
...     columns="gender",
...     num_files=[2, 3, 4],
... ).get()
              gender
file
audio/002.wav   male
audio/003.wav   male
audio/004.wav   male

Now, we update the table with age values from the other database.

>>> db2["gender_and_age"].update(db["age"]).get()
              gender   age
file
audio/002.wav   male    33
audio/003.wav   male    38
audio/004.wav   male  <NA>
audio/001.wav    NaN    40

And also copies the according scheme to the database:

>>> db2.schemes
age:
  {dtype: int, minimum: 20, maximum: 50}
gender:
  dtype: str
  labels: [female, male]