Datasets¶
Datasets available with audb as of Dec 09, 2024 in the repository audb-public. For each dataset, the latest version is shown.
name |
description |
license |
version |
schemes |
---|---|---|---|---|
The Aachen Impulse Response (AIR) database is a set of impulse responses that were measured in a wide variety of rooms. The initial aim of the AIR … |
1.4.2 |
azimuth, distance, mode, reverberation-time, room |
||
The Crowdsourced Language Assessment Corpus (CLAC) consists of audio recordings and automatically-generated transcripts from 1,832 speakers for sev… |
1.1.0 |
speaker: [age, gender, country, region, city, education(years), symptoms], age, city, country, education(years), gender, region, symptoms, task-name, transcript |
||
Multimodal Opinion Sentiment and Emotion Intensity Sentiment and emotion annotated multimodal data automatically collected from YouTube. The datase… |
1.2.4 |
emotion.intensity, emotion.presence, sentiment, sentiment.binarized, sentiment.binary, sentiment.binary.old, transcription |
||
Opinion-level annotated corpus of sentiment and subjectivity analysis in online videos. The dataset is annotated with labels for subjectivity, sent… |
1.1.1 |
gender, phoneme, sentiment, sentiment.binarized, sentiment.binary, sentiment.binary.old, transcription |
||
Cochl Acoustic Scene Dataset (CochlScene), is an acoustic scene dataset whose recordings are fully collected from crowdsourcing participants. Most … |
1.0.0 |
scene |
||
Cough-speech-sneeze: a data set of human sounds This dataset was collected by Dr. Shahin Amiriparian. It contains samples of human speech, coughing… |
2.0.1 |
category |
||
CREMA-D: Crowd-sourced Emotional Mutimodal Actors Dataset CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 m… |
1.3.0 |
emotion: [anger, disgust, fear, happiness, neutral, no_agreement, sadness], speaker: [age, sex, race, ethnicity], corrupted, emotion.agreement, emotion.intensity, emotion.level, sentence, votes |
||
CSS10 is a collection of single speaker speech data for 10 languages. Each of them consists of audio files recorded by a single volunteer and their… |
1.0.0 |
speaker:], language, normalized-transcription, transcription |
||
The establishment of the Estonian Emotional Speech Corpus (EESC) began in 2006 within the framework of the National Programme for Estonian Language… |
1.0.1 |
emotion: [anger, happiness, neutral, sadness], speaker: [gender, language], emotion.agreement, gender, language, text-matches-emotion, transcription |
||
Berlin Database of Emotional Speech. A German database of emotional utterances spoken by actors recorded as a part of the DFG funded research proje… |
1.4.1 |
emotion: [anger, boredom, disgust, fear, happiness, sadness, neutral], speaker: [age, gender, language], age, confidence, gender, language, transcription |
||
emoUERJ contains recordings of 10 portuguese sentences pronounced by 8 speakers in 4 emotions: happiness,anger,sadness,neutral. |
1.0.0 |
emotion: [happiness, anger, sadness, neutral], speaker: [gender], gender, take |
||
Emozionalmente is an extensive, crowdsourced Italian emotional speech corpus. The dataset consists of 6902 labeled samples acted out by 431 amateur… |
1.0.0 |
emotion: [anger, disgust, fear, happiness, neutral, no_agreement, sadness, surprise], speaker: [age, gender, mother_tongue], age, emotion.agreement, gender, mother_tongue, transcription, votes |
||
The ESC-50 dataset is a labeled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classifi… |
1.0.1 |
category, clip_id, esc10, fold, major, take |
||
Expresso is a dataset of expressive speech recordings. It contains read speech and singing in various styles including default, confused, enunciate… |
1.0.0 |
speaker:], channel, corpus, id, speaking_style, style |
||
FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42…. |
1.0.0 |
categories, license, manually_verified, noisy_small |
||
This collection of room impulse responses was measured in the Great Hall, the Octagon, and a classroom at the Mile End campus of Queen Mary, Univer… |
1.0.0 |
room, x, y |
||
This database contains six different sentences, pronounced by thirteen people (four male and nine female) in six emotions (anger, sadness, surprise… |
1.0.1 |
emotion: [anger, sadness, surprise, happiness, fear, neutral], speaker: [gender, age], age, gender, sentence |
||
LJSpeech consists of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each c… |
1.0.0 |
speaker: [gender], gender, normalized-transcription, transcription |
||
MESD (Mexican Emotional Speech Database) contains single-word utterances for different emotions like anger, disgust, fear, happiness, neutral, and … |
1.0.1 |
emotion: [anger, disgust, fear, happiness, neutral, sadness], gender, word, word_corpus |
||
The Microphone Impulse Response Project (MicIRP) contains impulse response data for vintage microphones. The impulse response files were created us… |
1.0.0 |
manufacturer |
||
The goal of this corpus is to provide data for music/speech discrimination, speech/nonspeech detection, and voice activity detection. The corpus is… |
1.0.0 |
artist, background_noise, composer, gender, genre, language, vocals |
||
NEMO is a polish dataset with emotional speech. It contains over 3 hours of emotional speech in 6 categories: anger, fear, happiness, sadness, surp… |
1.0.1 |
emotion: [anger, sadness, surprise, happiness, fear, neutral], speaker: [gender, age], age, gender, normalized_text, raw_text, sentence |
||
Impulse Responses (IR) and Reverberation Time (RT60) for different rooms. RT is given in seconds at 500Hz. Openair also has BRIRs and RT60s for 31…. |
1.0.0 |
reverberation-time |
||
Quechua contains 12420 recordings of emotional speech in Quechua Collao. Six actors were asked to read words and sentences with nine emotional cate… |
1.0.2 |
emotion: [anger, boredom, happiness, sleepiness, sadness, calmness, fear, exitement, neutral], speaker: [gender], arousal, arousal.agreement, arousal.original, dominance, …, is_single_word, sentence, transcription, translation, valence, valence.agreement, valence.original |
||
The Ryerson Audio-Visual Database of Emotional Speech and Song The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 73… |
1.1.3 |
emotion: [anger, calm, disgust, fear, happiness, neutral, sadness, surprise], speaker: [gender, language], emotional intensity, transcription, vocal channel |
||
The Ryerson Audio-Visual Database of Emotional Speech and Song The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 73… |
1.0.3 |
emotion: [neutral, calm, happy, sad, angry, fearful, disgust, suprised], speaker: [gender, language], emotional intensity, transcription, vocal channel |
||
This dataset contains 2138 speech samples, each from a different talker reading the same reading passage. Talkers come from 177 countries and have … |
2.2.0 |
speaker, age, age_onset, birthplace, content, country, native_language, sex, tone |
||
SUBESCO is an audio-only emotional speech corpus of 7000 sentence-level utterances of the Bangla language. The corpus contains 7:40:40h of audio an… |
1.0.0 |
emotion: [anger, disgust, fear, happiness, neutral, sadness, surprise], speaker: [gender], gender, sentence_number, speaker_name, take_number |
||
The UrbanSound8k dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes. All excerpts are taken from field recordings … |
1.0.0 |
category, clip_id, fold, salience |
||
VAD Toolkit: A Database for Voice Activity Detection At each environment, conversational speech by two Korean male speakers was recorded. The groun… |
1.1.0 |
noise |
||
The noise audio was collected at various urban locations throughout the San Francisco Bay Area in late 2018. The environments primarily consist of … |
1.0.0 |
day, file-id, l-to-r-width, location, noise-band, reverberation |