Datasets¶

Datasets available with audb as of Dec 09, 2024 in the repository audb-public. For each dataset, the latest version is shown.

name	description	license	version	schemes
air	The Aachen Impulse Response (AIR) database is a set of impulse responses that were measured in a wide variety of rooms. The initial aim of the AIR …	MIT	1.4.2	azimuth, distance, mode, reverberation-time, room
clac	The Crowdsourced Language Assessment Corpus (CLAC) consists of audio recordings and automatically-generated transcripts from 1,832 speakers for sev…	CC-BY-SA-4.0	1.1.0	speaker: [age, gender, country, region, city, education(years), symptoms], age, city, country, education(years), gender, region, symptoms, task-name, transcript
cmu-mosei	Multimodal Opinion Sentiment and Emotion Intensity Sentiment and emotion annotated multimodal data automatically collected from YouTube. The datase…	CC-BY-NC-4.0	1.2.4	emotion.intensity, emotion.presence, sentiment, sentiment.binarized, sentiment.binary, sentiment.binary.old, transcription
cmu-mosi	Opinion-level annotated corpus of sentiment and subjectivity analysis in online videos. The dataset is annotated with labels for subjectivity, sent…	CC-BY-NC-4.0	1.1.1	gender, phoneme, sentiment, sentiment.binarized, sentiment.binary, sentiment.binary.old, transcription
cochlscene	Cochl Acoustic Scene Dataset (CochlScene), is an acoustic scene dataset whose recordings are fully collected from crowdsourcing participants. Most …	CC-BY-SA-3.0	1.0.0	scene
cough-speech-sneeze	Cough-speech-sneeze: a data set of human sounds This dataset was collected by Dr. Shahin Amiriparian. It contains samples of human speech, coughing…	CC-BY-4.0	2.0.1	category
crema-d	CREMA-D: Crowd-sourced Emotional Mutimodal Actors Dataset CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 m…	Open Data Commons Open Database License (ODbL) v1.0	1.3.0	emotion: [anger, disgust, fear, happiness, neutral, no_agreement, sadness], speaker: [age, sex, race, ethnicity], corrupted, emotion.agreement, emotion.intensity, emotion.level, sentence, votes
css10	CSS10 is a collection of single speaker speech data for 10 languages. Each of them consists of audio files recorded by a single volunteer and their…	CC0-1.0	1.0.0	speaker:], language, normalized-transcription, transcription
eesc	The establishment of the Estonian Emotional Speech Corpus (EESC) began in 2006 within the framework of the National Programme for Estonian Language…	CC-BY-3.0	1.0.1	emotion: [anger, happiness, neutral, sadness], speaker: [gender, language], emotion.agreement, gender, language, text-matches-emotion, transcription
emodb	Berlin Database of Emotional Speech. A German database of emotional utterances spoken by actors recorded as a part of the DFG funded research proje…	CC0-1.0	1.4.1	emotion: [anger, boredom, disgust, fear, happiness, sadness, neutral], speaker: [age, gender, language], age, confidence, gender, language, transcription
emouerj	emoUERJ contains recordings of 10 portuguese sentences pronounced by 8 speakers in 4 emotions: happiness,anger,sadness,neutral.	CC-BY-4.0	1.0.0	emotion: [happiness, anger, sadness, neutral], speaker: [gender], gender, take
emozionalmente	Emozionalmente is an extensive, crowdsourced Italian emotional speech corpus. The dataset consists of 6902 labeled samples acted out by 431 amateur…	CC-BY-4.0	1.0.0	emotion: [anger, disgust, fear, happiness, neutral, no_agreement, sadness, surprise], speaker: [age, gender, mother_tongue], age, emotion.agreement, gender, mother_tongue, transcription, votes
esc-50	The ESC-50 dataset is a labeled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classifi…	CC-BY-NC-3.0	1.0.1	category, clip_id, esc10, fold, major, take
expresso	Expresso is a dataset of expressive speech recordings. It contains read speech and singing in various styles including default, confused, enunciate…	CC-BY-NC-4.0	1.0.0	speaker:], channel, corpus, id, speaking_style, style
fsdnoisy18k	FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42….	CC-BY-3.0	1.0.0	categories, license, manually_verified, noisy_small
ir-c4dm	This collection of room impulse responses was measured in the Great Hall, the Octagon, and a classroom at the Mile End campus of Queen Mary, Univer…	CC-BY-NC-4.0	1.0.0	room, x, y
kannada	This database contains six different sentences, pronounced by thirteen people (four male and nine female) in six emotions (anger, sadness, surprise…	CC-BY-4.0	1.0.1	emotion: [anger, sadness, surprise, happiness, fear, neutral], speaker: [gender, age], age, gender, sentence
ljspeech	LJSpeech consists of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each c…	CC0-1.0	1.0.0	speaker: [gender], gender, normalized-transcription, transcription
mesd	MESD (Mexican Emotional Speech Database) contains single-word utterances for different emotions like anger, disgust, fear, happiness, neutral, and …	CC-BY-4.0	1.0.1	emotion: [anger, disgust, fear, happiness, neutral, sadness], gender, word, word_corpus
micirp	The Microphone Impulse Response Project (MicIRP) contains impulse response data for vintage microphones. The impulse response files were created us…	CC-BY-SA-4.0	1.0.0	manufacturer
musan	The goal of this corpus is to provide data for music/speech discrimination, speech/nonspeech detection, and voice activity detection. The corpus is…	CC-BY-4.0	1.0.0	artist, background_noise, composer, gender, genre, language, vocals
nemo	NEMO is a polish dataset with emotional speech. It contains over 3 hours of emotional speech in 6 categories: anger, fear, happiness, sadness, surp…	CC-BY-NC-SA-4.0	1.0.1	emotion: [anger, sadness, surprise, happiness, fear, neutral], speaker: [gender, age], age, gender, normalized_text, raw_text, sentence
openair	Impulse Responses (IR) and Reverberation Time (RT60) for different rooms. RT is given in seconds at 500Hz. Openair also has BRIRs and RT60s for 31….	CC-BY-4.0	1.0.0	reverberation-time
quechua	Quechua contains 12420 recordings of emotional speech in Quechua Collao. Six actors were asked to read words and sentences with nine emotional cate…	CC-BY-4.0	1.0.2	emotion: [anger, boredom, happiness, sleepiness, sadness, calmness, fear, exitement, neutral], speaker: [gender], arousal, arousal.agreement, arousal.original, dominance, …, is_single_word, sentence, transcription, translation, valence, valence.agreement, valence.original
ravdess	The Ryerson Audio-Visual Database of Emotional Speech and Song The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 73…	CC-BY-NC-SA-4.0	1.1.3	emotion: [anger, calm, disgust, fear, happiness, neutral, sadness, surprise], speaker: [gender, language], emotional intensity, transcription, vocal channel
ravdess-videos	The Ryerson Audio-Visual Database of Emotional Speech and Song The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 73…	CC-BY-NC-SA-4.0	1.0.3	emotion: [neutral, calm, happy, sad, angry, fearful, disgust, suprised], speaker: [gender, language], emotional intensity, transcription, vocal channel
speech-accent-archive	This dataset contains 2138 speech samples, each from a different talker reading the same reading passage. Talkers come from 177 countries and have …	CC-BY-NC-SA-4.0	2.2.0	speaker, age, age_onset, birthplace, content, country, native_language, sex, tone
subesco	SUBESCO is an audio-only emotional speech corpus of 7000 sentence-level utterances of the Bangla language. The corpus contains 7:40:40h of audio an…	CC-BY-4.0	1.0.0	emotion: [anger, disgust, fear, happiness, neutral, sadness, surprise], speaker: [gender], gender, sentence_number, speaker_name, take_number
urbansound8k	The UrbanSound8k dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes. All excerpts are taken from field recordings …	CC-BY-NC-3.0	1.0.0	category, clip_id, fold, salience
vadtoolkit	VAD Toolkit: A Database for Voice Activity Detection At each environment, conversational speech by two Korean male speakers was recorded. The groun…	GPLv3	1.1.0	noise
wham	The noise audio was collected at various urban locations throughout the San Francisco Bay Area in late 2018. The environments primarily consist of …	CC-BY-NC-4.0	1.0.0	day, file-id, l-to-r-width, location, noise-band, reverberation