Dataset¶

class audbcards.Dataset(name, version, *, cache_root=None, load_tables=True)[source]¶

Dataset representation.

Dataset object that represents a dataset that can be loaded with audb.load().

Parameters

name (str) – name of dataset
version (str) – version of dataset
cache_root (Optional[str]) – cache folder. If None, the environmental variable AUDBCARDS_CACHE_ROOT, or audbcards.config.CACHE_ROOT is used
load_tables (bool) – if True, it caches values extracted from tables. Set this to False, if loading the tables takes too long, or does not fit into memory

archives¶

Dataset.archives¶: Number of archives of media files in dataset.

author¶

Dataset.author¶: Authors of the database.

backend¶

Dataset.backend¶: Dataset backend object.

bit_depths¶

Dataset.bit_depths¶: Bit depths of media files in dataset.

cache_root¶

Dataset.cache_root¶: Cache root folder.

channels¶

Dataset.channels¶: Channels of media files in dataset.

deps¶

Dataset.deps¶: Dataset dependency table.

description¶

Dataset.description¶: Source of the database.

duration¶

Dataset.duration¶: Total duration of media files in dataset.

example_media¶

Dataset.example_media¶

Example media file.

The media file is selected by its median duration from all files in the dataset with a duration between 0.5 s and 300 s. In addition, the media file needs to be stored in an archive with less than 100 media files. If no media file meets this criterium, None is returned instead.

file_durations¶

Dataset.file_durations¶

File durations in dataset in seconds.

Non media files, or media files containing 0 samples are excluded from this list.

files¶

Dataset.files¶: Number of media files in dataset.

formats¶

Dataset.formats¶: File formats of media files in dataset.

iso_languages¶

Dataset.iso_languages¶: Languages of the database as ISO 639-3 if possible.

languages¶

Dataset.languages¶: Languages of the database.

license¶

Dataset.license¶

License of dataset.

If no license is given, 'Unknown' is returned.

license_link¶

Dataset.license_link¶

Link to license of dataset.

If no link is available None is returned.

name¶

Dataset.name¶: Name of dataset.

publication_date¶

Dataset.publication_date¶: Date dataset was uploaded to repository.

publication_owner¶

Dataset.publication_owner¶: User who uploaded dataset to repository.

repository¶

Dataset.repository¶: Repository containing the dataset.

repository_link¶

Dataset.repository_link¶: Link to repository in Artifactory web UI.

repository_object¶

Dataset.repository_object¶: Repository object containing dataset.

sampling_rates¶

Dataset.sampling_rates¶: Sampling rates of media files in dataset.

schemes¶

Dataset.schemes¶: Schemes of dataset.

schemes_summary¶

Dataset.schemes_summary¶

Summary of dataset schemes.

It lists all schemes in a string, showing additional information on schemes named 'emotion' and 'speaker', e.g. 'speaker: [age, gender, language]'.

schemes_table¶

Dataset.schemes_table¶

Schemes table with name, type, min, max, labels, mappings.

The table is represented as a dictionary with column names as keys.

segment_durations¶

Dataset.segment_durations¶: Segment durations in dataset.

segments¶

Dataset.segments¶: Number of segments in dataset.

short_description¶

Dataset.short_description¶: Description of dataset shortened to 150 chars.

source¶

Dataset.source¶: Source of the database.

tables¶

Dataset.tables¶: Tables of the dataset.

tables_columns¶

Dataset.tables_columns¶

Number of columns for each table of the dataset.

Returns: dictionary with table IDs as keys and number of columns as values

Examples

>>> ds = Dataset("emodb", "1.4.1")
>>> ds.tables_columns["speaker"]
3

tables_preview¶

Dataset.tables_preview¶

Table preview for each table of the dataset.

Shows the header and the first 5 lines for each table as a list of lists. All table values are converted to strings, stripped from HTML tags or newlines, and limited to a maximum length of 100 characters.

Returns: dictionary with table IDs as keys and table previews as values

Examples

>>> from tabulate import tabulate
>>> ds = Dataset("emodb", "1.4.1")
>>> preview = ds.tables_preview["speaker"]
>>> print(tabulate(preview, headers="firstrow", tablefmt="github"))
|   speaker |   age | gender   | language   |
|-----------|-------|----------|------------|
|         3 |    31 | male     | deu        |
|         8 |    34 | female   | deu        |
|         9 |    21 | female   | deu        |
|        10 |    32 | male     | deu        |
|        11 |    26 | male     | deu        |

tables_rows¶

Dataset.tables_rows¶

Number of rows for each table of the dataset.

Returns: dictionary with table IDs as keys and number of rows as values

Examples

>>> ds = Dataset("emodb", "1.4.1")
>>> ds.tables_rows["speaker"]
10

tables_table¶

Dataset.tables_table¶: Tables of the dataset.

usage¶

Dataset.usage¶: Usage of the database.

version¶

Dataset.version¶: Version of dataset.

Dataset¶

archives¶

author¶

backend¶

bit_depths¶

cache_root¶

channels¶

deps¶

description¶

duration¶

example_media¶

file_durations¶

files¶

formats¶

header¶

iso_languages¶

languages¶

license¶

license_link¶

name¶

publication_date¶

publication_owner¶

repository¶

repository_link¶

repository_object¶

sampling_rates¶

schemes¶

schemes_summary¶

schemes_table¶

segment_durations¶

segments¶

short_description¶

source¶

tables¶

tables_columns¶

tables_preview¶

tables_rows¶

tables_table¶

usage¶

version¶