Publish a database¶
To publish a database we need to first create
and store a database in audformat
.
Afterwards we publish the database to a audb.Repository
.
Finally,
we add more files
and release a new version.
Create a database¶
We can create an example database
with the audformat.testing
module.
import audformat.testing
build_dir = "./age-test-1.0.0"
db = audformat.testing.create_db(minimal=True)
db.name = "age-test"
db.license = "CC0-1.0"
db.schemes["age"] = audformat.Scheme("int", minimum=20, maximum=90)
audformat.testing.add_table(
db,
table_id="age",
index_type="filewise",
columns="age",
num_files=3,
)
db.save(build_dir)
audformat.testing.create_audio_files(db)
This results in the following database,
stored under build_dir
.
db
name: age-test
source: internal
usage: unrestricted
languages: [deu, eng]
license: CC0-1.0
schemes:
age: {dtype: int, minimum: 20, maximum: 90}
tables:
age:
type: filewise
columns:
age: {scheme_id: age}
Containing a few random annotations.
db["age"].get()
age | |
---|---|
file | |
audio/001.wav | 42 |
audio/002.wav | 76 |
audio/003.wav | 21 |
Publish the first version¶
We define a repository on the local file system to publish the database to.
import audb
repository = audb.Repository(
name="data-local",
host="./data",
backend="file-system",
)
Then we select the folder, where the database is stored, and pick a version for publishing it.
deps = audb.publish(build_dir, "1.0.0", repository, verbose=False)
It returns a audb.Dependencies
object
that specifies
which files are part of the database
in which archives they are stored,
and information about audio metadata.
deps()
archive | bit_depth | channels | ... | sampling_rate | type | version | |
---|---|---|---|---|---|---|---|
db.age.parquet | 0 | 0 | ... | 0 | 0 | 1.0.0 | |
audio/001.wav | 436c65ec-1e42-f9de-2708-ecafe07e827e | 16 | 1 | ... | 16000 | 1 | 1.0.0 |
audio/002.wav | fda7e4d6-f2b2-4cff-cab5-906ef5d57607 | 16 | 1 | ... | 16000 | 1 | 1.0.0 |
audio/003.wav | e26ef45d-bdc1-6153-bdc4-852d83806e4a | 16 | 1 | ... | 16000 | 1 | 1.0.0 |
4 rows × 10 columns
We can compare this with the files stored in the repository.
import os
def list_files(path):
for root, dirs, files in os.walk(path):
level = root.replace(path, "").count(os.sep)
indent = " " * 2 * (level)
print(f"{indent}{os.path.basename(root)}/")
subindent = " " * 2 * (level + 1)
for f in files:
print(f"{subindent}{f}")
list_files(repository.host)
data/
data-local/
age-test/
meta/
1.0.0/
age.parquet
1.0.0/
db.yaml
db.parquet
media/
1.0.0/
fda7e4d6-f2b2-4cff-cab5-906ef5d57607.zip
e26ef45d-bdc1-6153-bdc4-852d83806e4a.zip
436c65ec-1e42-f9de-2708-ecafe07e827e.zip
As you can see all media files are stored
inside the media/
folder,
all tables inside the meta/
folder,
the database header in the file db.yaml
,
and the database dependencies
in the file db.parquet
.
Note,
that the structure of the folders
used for versioning
depends on the backend
,
and differs slightly
for an Artifactory backend.
To load the database,
or see which databases are available in your repository,
we need to tell audb
that it should use our repository
instead of its default ones.
audb.config.REPOSITORIES = [repository]
audb.available()
backend | host | repository | version | |
---|---|---|---|---|
name | ||||
age-test | file-system | ./data | data-local | 1.0.0 |
Update a database¶
In a next step we will add another file with age annotation to the database. As a first step we load the metadata of the previous version of the database to a new folder.
build_dir = "./age-test-1.1.0"
db = audb.load_to(
build_dir,
"age-test",
version="1.0.0",
only_metadata=True,
verbose=False,
)
Then we extend the age table by another file (audio/004.wav
)
and add the age annotation of 22 to it.
index = audformat.filewise_index(["audio/004.wav"])
db["age"].extend_index(index, inplace=True)
db["age"]["age"].set([22], index=index)
db["age"].get()
age | |
---|---|
file | |
audio/001.wav | 42 |
audio/002.wav | 76 |
audio/003.wav | 21 |
audio/004.wav | 22 |
We save it to the database build folder, overwrite the old table, and add a new audio file.
db.save(build_dir)
audformat.testing.create_audio_files(db)
Publishing works as before,
but this time we have to specify a version where our update should be based on.
audb.publish()
will then automatically figure out
which files have changed
and will only publish those.
deps = audb.publish(
build_dir,
"1.1.0",
repository,
previous_version="1.0.0",
verbose=False,
)
deps()
archive | bit_depth | channels | ... | sampling_rate | type | version | |
---|---|---|---|---|---|---|---|
db.age.parquet | 0 | 0 | ... | 0 | 0 | 1.1.0 | |
audio/001.wav | 436c65ec-1e42-f9de-2708-ecafe07e827e | 16 | 1 | ... | 16000 | 1 | 1.0.0 |
audio/002.wav | fda7e4d6-f2b2-4cff-cab5-906ef5d57607 | 16 | 1 | ... | 16000 | 1 | 1.0.0 |
audio/003.wav | e26ef45d-bdc1-6153-bdc4-852d83806e4a | 16 | 1 | ... | 16000 | 1 | 1.0.0 |
audio/004.wav | ef4d1e81-6488-95cf-a165-604d1e47d575 | 16 | 1 | ... | 16000 | 1 | 1.1.0 |
5 rows × 10 columns
It has just uploaded a new version of the table, and the new media files. For the other media files, it just depends on the previous published version. We can again inspect the repository.
list_files(repository.host)
data/
data-local/
age-test/
meta/
1.0.0/
age.parquet
1.1.0/
age.parquet
1.0.0/
db.yaml
db.parquet
media/
1.0.0/
fda7e4d6-f2b2-4cff-cab5-906ef5d57607.zip
e26ef45d-bdc1-6153-bdc4-852d83806e4a.zip
436c65ec-1e42-f9de-2708-ecafe07e827e.zip
1.1.0/
ef4d1e81-6488-95cf-a165-604d1e47d575.zip
1.1.0/
db.yaml
db.parquet
And check which databases are available.
audb.available()
backend | host | repository | version | |
---|---|---|---|---|
name | ||||
age-test | file-system | ./data | data-local | 1.0.0 |
age-test | file-system | ./data | data-local | 1.1.0 |
As you can even update one database by another one, you could automate the update step and let a database grow every day.
Real world example¶
We published a version of a small German acted emotional speech databases
called emodb
in the default Artifactory repository of audb
.
You can find the example code at
https://github.com/audeering/emodb
and can continue at Load a database
to see how to load and use a database.