Publish a database

To publish a database we need to first create and store a database in audformat. Afterwards we publish the database to a audb.Repository. Finally, we add more files and release a new version.

Create a database

We can create an example database with the audformat.testing module.

import audformat.testing

build_dir = "./age-test-1.0.0"

db = audformat.testing.create_db(minimal=True)
db.name = "age-test"
db.license = "CC0-1.0"
db.schemes["age"] = audformat.Scheme("int", minimum=20, maximum=90)
audformat.testing.add_table(
    db,
    table_id="age",
    index_type="filewise",
    columns="age",
    num_files=3,
)
db.save(build_dir)
audformat.testing.create_audio_files(db)

This results in the following database, stored under build_dir.

db
name: age-test
source: internal
usage: unrestricted
languages: [deu, eng]
license: CC0-1.0
schemes:
  age: {dtype: int, minimum: 20, maximum: 90}
tables:
  age:
    type: filewise
    columns:
      age: {scheme_id: age}

Containing a few random annotations.

db["age"].get()
age
file
audio/001.wav 21
audio/002.wav 74
audio/003.wav 68

Publish the first version

We define a repository on the local file system to publish the database to.

import audb

repository = audb.Repository(
    name="data-local",
    host="./data",
    backend="file-system",
)

Then we select the folder, where the database is stored, and pick a version for publishing it.

deps = audb.publish(build_dir, "1.0.0", repository, verbose=False)

It returns a audb.Dependencies object that specifies which files are part of the database in which archives they are stored, and information about audio metadata.

deps()
archive bit_depth channels ... sampling_rate type version
db.age.parquet 0 0 ... 0 0 1.0.0
audio/001.wav 436c65ec-1e42-f9de-2708-ecafe07e827e 16 1 ... 16000 1 1.0.0
audio/002.wav fda7e4d6-f2b2-4cff-cab5-906ef5d57607 16 1 ... 16000 1 1.0.0
audio/003.wav e26ef45d-bdc1-6153-bdc4-852d83806e4a 16 1 ... 16000 1 1.0.0

4 rows × 10 columns

We can compare this with the files stored in the repository.

import os

def list_files(path):
    for root, dirs, files in os.walk(path):
        level = root.replace(path, "").count(os.sep)
        indent = " " * 2 * (level)
        print(f"{indent}{os.path.basename(root)}/")
        subindent = " " * 2 * (level + 1)
        for f in files:
            print(f"{subindent}{f}")

list_files(repository.host)
data/
  data-local/
    age-test/
      meta/
        1.0.0/
          age.parquet
      1.0.0/
        db.yaml
        db.parquet
      media/
        1.0.0/
          fda7e4d6-f2b2-4cff-cab5-906ef5d57607.zip
          e26ef45d-bdc1-6153-bdc4-852d83806e4a.zip
          436c65ec-1e42-f9de-2708-ecafe07e827e.zip

As you can see all media files are stored inside the media/ folder, all tables inside the meta/ folder, the database header in the file db.yaml, and the database dependencies in the file db.parquet. Note, that the structure of the folders used for versioning depends on the backend, and differs slightly for an Artifactory backend.

To load the database, or see which databases are available in your repository, we need to tell audb that it should use our repository instead of its default ones.

audb.config.REPOSITORIES = [repository]
audb.available()
backend host repository version
name
age-test file-system ./data data-local 1.0.0

Update a database

In a next step we will add another file with age annotation to the database. As a first step we load the metadata of the previous version of the database to a new folder.

build_dir = "./age-test-1.1.0"
db = audb.load_to(
    build_dir,
    "age-test",
    version="1.0.0",
    only_metadata=True,
    verbose=False,
)

Then we extend the age table by another file (audio/004.wav) and add the age annotation of 22 to it.

index = audformat.filewise_index(["audio/004.wav"])
db["age"].extend_index(index, inplace=True)
db["age"]["age"].set([22], index=index)

db["age"].get()
age
file
audio/001.wav 21
audio/002.wav 74
audio/003.wav 68
audio/004.wav 22

We save it to the database build folder, overwrite the old table, and add a new audio file.

db.save(build_dir)
audformat.testing.create_audio_files(db)

Publishing works as before, but this time we have to specify a version where our update should be based on. audb.publish() will then automatically figure out which files have changed and will only publish those.

deps = audb.publish(
    build_dir,
    "1.1.0",
    repository,
    previous_version="1.0.0",
    verbose=False,
)
deps()
archive bit_depth channels ... sampling_rate type version
db.age.parquet 0 0 ... 0 0 1.1.0
audio/001.wav 436c65ec-1e42-f9de-2708-ecafe07e827e 16 1 ... 16000 1 1.0.0
audio/002.wav fda7e4d6-f2b2-4cff-cab5-906ef5d57607 16 1 ... 16000 1 1.0.0
audio/003.wav e26ef45d-bdc1-6153-bdc4-852d83806e4a 16 1 ... 16000 1 1.0.0
audio/004.wav ef4d1e81-6488-95cf-a165-604d1e47d575 16 1 ... 16000 1 1.1.0

5 rows × 10 columns

It has just uploaded a new version of the table, and the new media files. For the other media files, it just depends on the previous published version. We can again inspect the repository.

list_files(repository.host)
data/
  data-local/
    age-test/
      meta/
        1.0.0/
          age.parquet
        1.1.0/
          age.parquet
      1.0.0/
        db.yaml
        db.parquet
      media/
        1.0.0/
          fda7e4d6-f2b2-4cff-cab5-906ef5d57607.zip
          e26ef45d-bdc1-6153-bdc4-852d83806e4a.zip
          436c65ec-1e42-f9de-2708-ecafe07e827e.zip
        1.1.0/
          ef4d1e81-6488-95cf-a165-604d1e47d575.zip
      1.1.0/
        db.yaml
        db.parquet

And check which databases are available.

audb.available()
backend host repository version
name
age-test file-system ./data data-local 1.0.0
age-test file-system ./data data-local 1.1.0

As you can even update one database by another one, you could automate the update step and let a database grow every day.

Real world example

We published a version of a small German acted emotional speech databases called emodb in the default Artifactory repository of audb. You can find the example code at https://github.com/audeering/emodb and can continue at Load a database to see how to load and use a database.