publish()

audb.publish(db_root, version, repository, *, archives=None, previous_version='latest', cache_root=None, num_workers=1, verbose=True)[source]

Publish database.

Publishes a database conform to audformat, stored in the db_root folder.

A database can have dependencies to media files and tables of an older version. E.g. you might alter an existing table by adding labels for new media files to it and publish it as a new version. audb.publish() will then upload new and altered files and update the dependencies accordingly.

To update a database, you first have to load the version that the new version should depend on with audb.load_to() to db_root. Media files that are not altered can be omitted, so it is recommended to set only_metadata=True in audb.load_to(). Afterwards you make your changes to that folder and run audb.publish(). To remove media files from a database, make sure they are no longer referenced in the tables.

Setting previous_version=None allows you to start from scratch and upload all files even if an older versions exist. In this case you don’t call audb.load_to() before running audb.publish().

Handling of audio formats is based on the file extension in audb. This means the file extension must be lowercase and should match the audio format of the file, e.g. .wav.

When canceling audb.publish() during publication you can restart it afterwards. It will continue from the current state, but you might need overwrite permissions in addition to write permissions on the backend.

audb uses md5 hashes of the database files to check if they have changed. Be aware that for certain file formats, like parquet, md5 hashes might differ for files with identical content. Reasons include the library that wrote the file, involved compression codes, or additional metadata written by the library. For files stored in parquet format, audb.publish() will first look for a hash stored in its metadata under the b"hash" key. For parquet tables, this deterministic hash is automatically added by audformat.

Tables stored only as pickle files, are converted to parquet files before publication. If a table is stored as a parquet and csv file, the csv file is ignored, and the parquet file is published.

Parameters
  • db_root (str) – root directory of database

  • version (str) – version string

  • repository (Repository) – name of repository

  • archives (Optional[Mapping[str, str]]) – dictionary mapping files to archive names. Can be used to bundle files into archives, which will speed up communication with the server if the database contains many small files. Archive name must not include an extension

  • previous_version (Optional[str]) – specifies the version this publication should be based on. If 'latest' it will use automatically the latest published version or None if no version was published. If None it assumes you start from scratch

  • cache_root (Optional[str]) – cache folder where databases are stored. If not set audb.default_cache_root() is used. Only used to read the dependencies of the previous version

  • num_workers (Optional[int]) – number of parallel jobs or 1 for sequential processing. If None will be set to the number of processors on the machine multiplied by 5

  • verbose (bool) – show debug messages

Return type

Dependencies

Returns

dependency object

Raises
  • RuntimeError – if version already exists

  • RuntimeError – if database tables reference non-existing files

  • RuntimeError – if database attachment path does not exist, is a symlink, is empty, or contains an empty sub-folder

  • RuntimeError – if database in db_root depends on other version as indicated by previous_version

  • RuntimeError – if database is not portable, see audformat.Database.is_portable()

  • RuntimeError – if non-standard formats like MP3 and MP4 are published, but sox and/or mediafile is not installed

  • RuntimeError – if the type of a database file changes, e.g. from media to attachment

  • RuntimeError – if a new media file has an uppercase letter in its file extension

  • RuntimeError – if database contains tables, misc tables, or attachments that are stored under an ID using a char not in '[A-Za-z0-9._-]'

  • ValueError – if version or previous_version cannot be parsed by audeer.StrictVersion

  • ValueError – if previous_version >= version