Dataset¶
- class audbcards.Dataset(name, version, *, cache_root=None, load_tables=True)[source]¶
Dataset representation.
Dataset object that represents a dataset that can be loaded with
audb.load().- Parameters:
name (
str) – name of datasetversion (
str) – version of datasetcache_root (
Optional[str]) – cache folder. IfNone, the environmental variableAUDBCARDS_CACHE_ROOT, oraudbcards.config.CACHE_ROOTis usedload_tables (
bool) – ifTrue, it caches values extracted from tables. Set this toFalse, if loading the tables takes too long, or does not fit into memory
archives¶
- Dataset.archives¶
Number of archives of media files in dataset.
backend¶
- Dataset.backend¶
Dataset backend object.
bit_depths¶
- Dataset.bit_depths¶
Bit depths of media files in dataset.
cache_root¶
- Dataset.cache_root¶
Cache root folder.
channels¶
- Dataset.channels¶
Channels of media files in dataset.
deps¶
- Dataset.deps¶
Dataset dependency table.
description¶
- Dataset.description¶
Source of the database.
duration¶
- Dataset.duration¶
Total duration of media files in dataset.
example_json¶
- Dataset.example_json¶
Example json file.
Path to example json file from dataset. The json file needs to be stored in an archive with less than 100 files. If the json file does not meet this criterium or no json file is part of the dataset,
Noneis returned instead.
example_media¶
- Dataset.example_media¶
Example media file.
The media file is selected by its median duration from all files in the dataset with a duration between 0.5 s and 300 s. In addition, the media file needs to be stored in an archive with less than 100 media files. If no media file meets this criterium,
Noneis returned instead.
file_durations¶
- Dataset.file_durations¶
File durations in dataset in seconds.
Non media files, or media files containing 0 samples are excluded from this list.
files¶
- Dataset.files¶
Number of media files in dataset.
formats¶
- Dataset.formats¶
File formats of media files in dataset.
header¶
- Dataset.header¶
Dataset header.
iso_languages¶
- Dataset.iso_languages¶
Languages of the database as ISO 639-3 if possible.
languages¶
- Dataset.languages¶
Languages of the database.
license¶
- Dataset.license¶
License of dataset.
If no license is given,
'Unknown'is returned.
license_link¶
- Dataset.license_link¶
Link to license of dataset.
If no link is available
Noneis returned.
name¶
- Dataset.name¶
Name of dataset.
publication_date¶
- Dataset.publication_date¶
Date dataset was uploaded to repository.
publication_owner¶
- Dataset.publication_owner¶
User who uploaded dataset to repository.
repository¶
- Dataset.repository¶
Repository containing the dataset.
repository_link¶
- Dataset.repository_link¶
Link to repository in Artifactory web UI.
repository_object¶
- Dataset.repository_object¶
Repository object containing dataset.
sampling_rates¶
- Dataset.sampling_rates¶
Sampling rates of media files in dataset.
schemes¶
- Dataset.schemes¶
Schemes of dataset.
schemes_summary¶
- Dataset.schemes_summary¶
Summary of dataset schemes.
It lists all schemes in a string, showing additional information on schemes named
'emotion'and'speaker', e.g.'speaker: [age, gender, language]'.
schemes_table¶
- Dataset.schemes_table¶
Schemes table with name, type, min, max, labels, mappings.
The table is represented as a dictionary with column names as keys.
segment_durations¶
- Dataset.segment_durations¶
Segment durations in dataset.
segments¶
- Dataset.segments¶
Number of segments in dataset.
short_description¶
- Dataset.short_description¶
Description of dataset shortened to 150 chars.
source¶
- Dataset.source¶
Source of the database.
tables¶
- Dataset.tables¶
Tables of the dataset.
tables_columns¶
- Dataset.tables_columns¶
Number of columns for each table of the dataset.
- Returns:
dictionary with table IDs as keys and number of columns as values
Examples
>>> ds = Dataset("emodb", "1.4.1") >>> ds.tables_columns["speaker"] 3
tables_preview¶
- Dataset.tables_preview¶
Table preview for each table of the dataset.
Shows the header and the first 5 lines for each table as a list of lists. All table values are converted to strings, stripped from HTML tags or newlines, and limited to a maximum length of 100 characters.
- Returns:
dictionary with table IDs as keys and table previews as values
Examples
>>> from tabulate import tabulate >>> ds = Dataset("emodb", "1.4.1") >>> preview = ds.tables_preview["speaker"] >>> print(tabulate(preview, headers="firstrow", tablefmt="github")) | speaker | age | gender | language | |-----------|-------|----------|------------| | 3 | 31 | male | deu | | 8 | 34 | female | deu | | 9 | 21 | female | deu | | 10 | 32 | male | deu | | 11 | 26 | male | deu |
tables_rows¶
- Dataset.tables_rows¶
Number of rows for each table of the dataset.
- Returns:
dictionary with table IDs as keys and number of rows as values
Examples
>>> ds = Dataset("emodb", "1.4.1") >>> ds.tables_rows["speaker"] 10
tables_table¶
- Dataset.tables_table¶
Tables of the dataset.
usage¶
- Dataset.usage¶
Usage of the database.
version¶
- Dataset.version¶
Version of dataset.