Create a database

The following examples show you how to create a audformat.Database and store it to the disk.

Use existing audio and CSV files

Assume you have three audio files in a sub-folder audio:

audio/utt01.wav
audio/utt02.wav
audio/utt03.wav

Let’s assume in addition you have three raters annotated the files for the emotion anger in the range 0 to 5.

import io


# Create dummy CSV data.
# In a real example this would be a file on your disk
CSV = io.StringIO("""
file,R1,R2,R3
audio/utt01.wav,1,0,1
audio/utt02.wav,1,3,2
audio/utt03.wav,,5,4
""")

From the CSV file you create a audformat.Database. The database will contain one audformat.Table with the ID anger, split ID train, and media ID microphone. The table contains three audformat.Column objects, one per rater. Each column has the scheme ID anger for the defined scheme covering the emotion anger.

import audformat


df = audformat.utils.read_csv(CSV)

# Create database
db = audformat.Database(
    name="foo",
    source="https://github.com/audeering/audformat/",
    usage=audformat.define.Usage.UNRESTRICTED,
)

# Add media, split and scheme
db.media["microphone"] = audformat.Media(
    audformat.define.MediaType.AUDIO,
    format="wav",
)
db.splits["train"] = audformat.Split(
    audformat.define.SplitType.TRAIN,
)
db.schemes["anger"] = audformat.Scheme(
    audformat.define.DataType.INTEGER,
    minimum=0,
    maximum=5,
)

# Create table and fill with data
db["anger"] = audformat.Table(
    index=df.index,
    media_id="microphone",
    split_id="train",
)
for rater in df.columns:
    db.raters[rater] = audformat.Rater()
    db["anger"][rater] = audformat.Column(scheme_id="anger", rater_id=rater)
    db["anger"][rater].set(df[rater])

The resulting audformat.Database will then contain:

db
name: foo
source: https://github.com/audeering/audformat/
usage: unrestricted
languages: []
media:
  microphone: {type: audio, format: wav}
raters:
  R1: {type: human}
  R2: {type: human}
  R3: {type: human}
schemes:
  anger: {dtype: int, minimum: 0, maximum: 5}
splits:
  train: {type: train}
tables:
  anger:
    type: filewise
    split_id: train
    media_id: microphone
    columns:
      R1: {scheme_id: anger, rater_id: R1}
      R2: {scheme_id: anger, rater_id: R2}
      R3: {scheme_id: anger, rater_id: R3}

For more information on how to define a database, have a look at the code examples in the database specification.

Create a test database

If you want to write unit tests using a audformat.Database, or you just want to play around with a database without creating one, you can use audformat.testing. It provides you with a command to create a database, containing all possible tables types:

import audformat.testing


db = audformat.testing.create_db()

Which results in the following audformat.Table objects:

db.tables
files:
  type: filewise
  split_id: train
  media_id: microphone
  columns:
    bool: {scheme_id: bool, rater_id: gold}
    date: {scheme_id: date, rater_id: gold}
    float: {scheme_id: float, rater_id: gold}
    int: {scheme_id: int, rater_id: gold}
    label: {scheme_id: label, rater_id: gold}
    label_map_int: {scheme_id: label_map_int, rater_id: gold}
    label_map_misc: {scheme_id: label_map_misc, rater_id: gold}
    label_map_str: {scheme_id: label_map_str, rater_id: gold}
    string: {scheme_id: string, rater_id: gold}
    time: {scheme_id: time, rater_id: gold}
    no_scheme: {}
segments:
  type: segmented
  split_id: dev
  media_id: microphone
  columns:
    bool: {scheme_id: bool, rater_id: gold}
    date: {scheme_id: date, rater_id: gold}
    float: {scheme_id: float, rater_id: gold}
    int: {scheme_id: int, rater_id: gold}
    label: {scheme_id: label, rater_id: gold}
    label_map_int: {scheme_id: label_map_int, rater_id: gold}
    label_map_misc: {scheme_id: label_map_misc, rater_id: gold}
    label_map_str: {scheme_id: label_map_str, rater_id: gold}
    string: {scheme_id: string, rater_id: gold}
    time: {scheme_id: time, rater_id: gold}
    no_scheme: {}

Or you can create a database, containing only the minimum entries, required by the database specification:

db_minimal = audformat.testing.create_db(minimal=True)

Which results in the following audformat.Database:

db_minimal
name: unittest
source: internal
usage: unrestricted
languages: [deu, eng]