Dataset

class audbcards.Dataset(name, version, *, cache_root=None, load_tables=True)[source]

Dataset representation.

Dataset object that represents a dataset that can be loaded with audb.load().

Parameters
  • name (str) – name of dataset

  • version (str) – version of dataset

  • cache_root (Optional[str]) – cache folder. If None, the environmental variable AUDBCARDS_CACHE_ROOT, or audbcards.config.CACHE_ROOT is used

  • load_tables (bool) – if True, it caches values extracted from tables. Set this to False, if loading the tables takes too long, or does not fit into memory

archives

Dataset.archives

Number of archives of media files in dataset.

author

Dataset.author

Authors of the database.

backend

Dataset.backend

Dataset backend object.

bit_depths

Dataset.bit_depths

Bit depths of media files in dataset.

cache_root

Dataset.cache_root

Cache root folder.

channels

Dataset.channels

Channels of media files in dataset.

deps

Dataset.deps

Dataset dependency table.

description

Dataset.description

Source of the database.

duration

Dataset.duration

Total duration of media files in dataset.

example_media

Dataset.example_media

Example media file.

The media file is selected by its median duration from all files in the dataset with a duration between 0.5 s and 300 s. In addition, the media file needs to be stored in an archive with less than 100 media files. If no media file meets this criterium, None is returned instead.

file_durations

Dataset.file_durations

File durations in dataset in seconds.

files

Dataset.files

Number of media files in dataset.

formats

Dataset.formats

File formats of media files in dataset.

iso_languages

Dataset.iso_languages

Languages of the database as ISO 639-3 if possible.

languages

Dataset.languages

Languages of the database.

license

Dataset.license

License of dataset.

If no license is given, 'Unknown' is returned.

name

Dataset.name

Name of dataset.

publication_date

Dataset.publication_date

Date dataset was uploaded to repository.

publication_owner

Dataset.publication_owner

User who uploaded dataset to repository.

repository

Dataset.repository

Repository containing the dataset.

repository_object

Dataset.repository_object

Repository object containing dataset.

sampling_rates

Dataset.sampling_rates

Sampling rates of media files in dataset.

schemes

Dataset.schemes

Schemes of dataset.

schemes_summary

Dataset.schemes_summary

Summary of dataset schemes.

It lists all schemes in a string, showing additional information on schemes named 'emotion' and 'speaker', e.g. 'speaker: [age, gender, language]'.

schemes_table

Dataset.schemes_table

Schemes table with name, type, min, max, labels, mappings.

The table is represented as a dictionary with column names as keys.

segment_durations

Dataset.segment_durations

Segment durations in dataset.

segments

Dataset.segments

Number of segments in dataset.

short_description

Dataset.short_description

Description of dataset shortened to 150 chars.

source

Dataset.source

Source of the database.

tables

Dataset.tables

Tables of the dataset.

tables_columns

Dataset.tables_columns

Number of columns for each table of the dataset.

Returns

dictionary with table IDs as keys and number of columns as values

Examples

>>> ds = Dataset("emodb", "1.4.1")
>>> ds.tables_columns["speaker"]
3

tables_preview

Dataset.tables_preview

Table preview for each table of the dataset.

Shows the header and the first 5 lines for each table as a list of lists. All table values are converted to strings, stripped from HTML tags or newlines, and limited to a maximum length of 100 characters.

Returns

dictionary with table IDs as keys and table previews as values

Examples

>>> from tabulate import tabulate
>>> ds = Dataset("emodb", "1.4.1")
>>> preview = ds.tables_preview["speaker"]
>>> print(tabulate(preview, headers="firstrow", tablefmt="github"))
|   speaker |   age | gender   | language   |
|-----------|-------|----------|------------|
|         3 |    31 | male     | deu        |
|         8 |    34 | female   | deu        |
|         9 |    21 | female   | deu        |
|        10 |    32 | male     | deu        |
|        11 |    26 | male     | deu        |

tables_rows

Dataset.tables_rows

Number of rows for each table of the dataset.

Returns

dictionary with table IDs as keys and number of rows as values

Examples

>>> ds = Dataset("emodb", "1.4.1")
>>> ds.tables_rows["speaker"]
10

tables_table

Dataset.tables_table

Tables of the dataset.

usage

Dataset.usage

Usage of the database.

version

Dataset.version

Version of dataset.