Publish a database

To publish a database we need to first create and store a database in audformat. Afterwards we publish the database to a audb.Repository. Finally, we add more files and release a new version.

Create a database

We can create an example database with the audformat.testing module.

import random

import audeer
import audformat.testing

random.seed(1)
build_dir = audeer.mkdir("./age-test-1.0.0")

db = audformat.testing.create_db(minimal=True)
db.name = "age-test"
db.license = "CC0-1.0"
db.schemes["age"] = audformat.Scheme("int", minimum=20, maximum=90)
audformat.testing.add_table(
    db,
    table_id="age",
    index_type="filewise",
    columns="age",
    num_files=3,
)
db.save(build_dir)
audformat.testing.create_audio_files(db)

This results in the following database, stored under build_dir.

>>> db
name: age-test
source: internal
usage: unrestricted
languages: [deu, eng]
license: CC0-1.0
schemes:
  age: {dtype: int, minimum: 20, maximum: 90}
tables:
  age:
    type: filewise
    columns:
      age: {scheme_id: age}

Containing a few random annotations.

>>> db["age"].get()
               age
file
audio/001.wav   37
audio/002.wav   28
audio/003.wav   52

Publish the first version

We define a repository on the local file system to publish the database to.

audeer.mkdir("./data", "data-local")
repository = audb.Repository(
    name="data-local",
    host="./data",
    backend="file-system",
)

Then we select the folder, where the database is stored, and pick a version for publishing it.

deps = audb.publish(build_dir, "1.0.0", repository, verbose=False)

It returns a audb.Dependencies object that specifies which files are part of the database in which archives they are stored, and information about audio metadata.

>>> deps()
                                             archive  bit_depth  ...  type version
db.age.parquet                                                0  ...     0   1.0.0
audio/001.wav   436c65ec-1e42-f9de-2708-ecafe07e827e         16  ...     1   1.0.0
audio/002.wav   fda7e4d6-f2b2-4cff-cab5-906ef5d57607         16  ...     1   1.0.0
audio/003.wav   e26ef45d-bdc1-6153-bdc4-852d83806e4a         16  ...     1   1.0.0

[4 rows x 10 columns]

We can compare this with the files stored in the repository.

import os

def list_files(path):
    for root, _, files in sorted(os.walk(path)):
        level = root.replace(path, "").count(os.sep)
        indent = " " * 2 * (level)
        print(f"{indent}{os.path.basename(root)}/")
        subindent = " " * 2 * (level + 1)
        for f in sorted(files):
            print(f"{subindent}{f}")
>>> list_files(repository.host)
data/
  data-local/
    age-test/
      1.0.0/
        db.parquet
        db.yaml
      media/
        1.0.0/
          436c65ec-1e42-f9de-2708-ecafe07e827e.zip
          e26ef45d-bdc1-6153-bdc4-852d83806e4a.zip
          fda7e4d6-f2b2-4cff-cab5-906ef5d57607.zip
      meta/
        1.0.0/
          age.parquet

As you can see all media files are stored inside the media/ folder, all tables inside the meta/ folder, the database header in the file db.yaml, and the database dependencies in the file db.parquet. Note, that the structure of the folders used for versioning depends on the backend, and differs slightly for an Artifactory backend.

To load the database, or see which databases are available in your repository, we need to tell audb that it should use our repository instead of its default ones.

>>> audb.config.REPOSITORIES = [repository]
>>> audb.available()
              backend    host  repository version
name
age-test  file-system  ./data  data-local   1.0.0

Update a database

In a next step we will add another file with age annotation to the database. As a first step we load the metadata of the previous version of the database to a new folder.

build_dir = audeer.mkdir("./age-test-1.1.0")
db = audb.load_to(
    build_dir,
    "age-test",
    version="1.0.0",
    only_metadata=True,
    verbose=False,
)

Then we extend the age table by another file (audio/004.wav) and add the age annotation of 22 to it.

index = audformat.filewise_index(["audio/004.wav"])
db["age"].extend_index(index, inplace=True)
db["age"]["age"].set([22], index=index)
>>> db["age"].get()
               age
file
audio/001.wav   37
audio/002.wav   28
audio/003.wav   52
audio/004.wav   22

We save it to the database build folder, overwrite the old table, and add a new audio file.

db.save(build_dir)
audformat.testing.create_audio_files(db)

Publishing works as before, but this time we have to specify a version where our update should be based on. audb.publish() will then automatically figure out which files have changed and will only publish those.

deps = audb.publish(
    build_dir,
    "1.1.0",
    repository,
    previous_version="1.0.0",
    verbose=False,
)
>>> deps()
                                             archive  bit_depth  ...  type version
db.age.parquet                                                0  ...     0   1.1.0
audio/001.wav   436c65ec-1e42-f9de-2708-ecafe07e827e         16  ...     1   1.0.0
audio/002.wav   fda7e4d6-f2b2-4cff-cab5-906ef5d57607         16  ...     1   1.0.0
audio/003.wav   e26ef45d-bdc1-6153-bdc4-852d83806e4a         16  ...     1   1.0.0
audio/004.wav   ef4d1e81-6488-95cf-a165-604d1e47d575         16  ...     1   1.1.0

[5 rows x 10 columns]

It has just uploaded a new version of the table, and the new media files. For the other media files, it just depends on the previous published version. We can again inspect the repository.

>>> list_files(repository.host)
data/
  data-local/
    age-test/
      1.0.0/
        db.parquet
        db.yaml
      1.1.0/
        db.parquet
        db.yaml
      media/
        1.0.0/
          436c65ec-1e42-f9de-2708-ecafe07e827e.zip
          e26ef45d-bdc1-6153-bdc4-852d83806e4a.zip
          fda7e4d6-f2b2-4cff-cab5-906ef5d57607.zip
        1.1.0/
          ef4d1e81-6488-95cf-a165-604d1e47d575.zip
      meta/
        1.0.0/
          age.parquet
        1.1.0/
          age.parquet

And check which databases are available.

>>> audb.available()
              backend    host  repository version
name
age-test  file-system  ./data  data-local   1.0.0
age-test  file-system  ./data  data-local   1.1.0

As you can even update one database by another one, you could automate the update step and let a database grow every day.

Real world example

We published a version of a small German acted emotional speech databases called emodb in the default Artifactory repository of audb. You can find the example code at https://github.com/audeering/emodb and can continue at Load a database to see how to load and use a database.