Datasets

Datasets available with audb as of Dec 09, 2024 in the repository audb-public. For each dataset, the latest version is shown.

name

description

license

version

schemes

air

The Aachen Impulse Response (AIR) database is a set of impulse responses that were measured in a wide variety of rooms. The initial aim of the AIR …

MIT

1.4.2

azimuth, distance, mode, reverberation-time, room

clac

The Crowdsourced Language Assessment Corpus (CLAC) consists of audio recordings and automatically-generated transcripts from 1,832 speakers for sev…

CC-BY-SA-4.0

1.1.0

speaker: [age, gender, country, region, city, education(years), symptoms], age, city, country, education(years), gender, region, symptoms, task-name, transcript

cmu-mosei

Multimodal Opinion Sentiment and Emotion Intensity Sentiment and emotion annotated multimodal data automatically collected from YouTube. The datase…

CC-BY-NC-4.0

1.2.4

emotion.intensity, emotion.presence, sentiment, sentiment.binarized, sentiment.binary, sentiment.binary.old, transcription

cmu-mosi

Opinion-level annotated corpus of sentiment and subjectivity analysis in online videos. The dataset is annotated with labels for subjectivity, sent…

CC-BY-NC-4.0

1.1.1

gender, phoneme, sentiment, sentiment.binarized, sentiment.binary, sentiment.binary.old, transcription

cochlscene

Cochl Acoustic Scene Dataset (CochlScene), is an acoustic scene dataset whose recordings are fully collected from crowdsourcing participants. Most …

CC-BY-SA-3.0

1.0.0

scene

cough-speech-sneeze

Cough-speech-sneeze: a data set of human sounds This dataset was collected by Dr. Shahin Amiriparian. It contains samples of human speech, coughing…

CC-BY-4.0

2.0.1

category

crema-d

CREMA-D: Crowd-sourced Emotional Mutimodal Actors Dataset CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 m…

Open Data Commons Open Database License (ODbL) v1.0

1.3.0

emotion: [anger, disgust, fear, happiness, neutral, no_agreement, sadness], speaker: [age, sex, race, ethnicity], corrupted, emotion.agreement, emotion.intensity, emotion.level, sentence, votes

css10

CSS10 is a collection of single speaker speech data for 10 languages. Each of them consists of audio files recorded by a single volunteer and their…

CC0-1.0

1.0.0

speaker:], language, normalized-transcription, transcription

eesc

The establishment of the Estonian Emotional Speech Corpus (EESC) began in 2006 within the framework of the National Programme for Estonian Language…

CC-BY-3.0

1.0.1

emotion: [anger, happiness, neutral, sadness], speaker: [gender, language], emotion.agreement, gender, language, text-matches-emotion, transcription

emodb

Berlin Database of Emotional Speech. A German database of emotional utterances spoken by actors recorded as a part of the DFG funded research proje…

CC0-1.0

1.4.1

emotion: [anger, boredom, disgust, fear, happiness, sadness, neutral], speaker: [age, gender, language], age, confidence, gender, language, transcription

emouerj

emoUERJ contains recordings of 10 portuguese sentences pronounced by 8 speakers in 4 emotions: happiness,anger,sadness,neutral.

CC-BY-4.0

1.0.0

emotion: [happiness, anger, sadness, neutral], speaker: [gender], gender, take

emozionalmente

Emozionalmente is an extensive, crowdsourced Italian emotional speech corpus. The dataset consists of 6902 labeled samples acted out by 431 amateur…

CC-BY-4.0

1.0.0

emotion: [anger, disgust, fear, happiness, neutral, no_agreement, sadness, surprise], speaker: [age, gender, mother_tongue], age, emotion.agreement, gender, mother_tongue, transcription, votes

esc-50

The ESC-50 dataset is a labeled collection of 2000 environmental audio recordings suitable for benchmarking methods of environmental sound classifi…

CC-BY-NC-3.0

1.0.1

category, clip_id, esc10, fold, major, take

expresso

Expresso is a dataset of expressive speech recordings. It contains read speech and singing in various styles including default, confused, enunciate…

CC-BY-NC-4.0

1.0.0

speaker:], channel, corpus, id, speaking_style, style

fsdnoisy18k

FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42….

CC-BY-3.0

1.0.0

categories, license, manually_verified, noisy_small

ir-c4dm

This collection of room impulse responses was measured in the Great Hall, the Octagon, and a classroom at the Mile End campus of Queen Mary, Univer…

CC-BY-NC-4.0

1.0.0

room, x, y

kannada

This database contains six different sentences, pronounced by thirteen people (four male and nine female) in six emotions (anger, sadness, surprise…

CC-BY-4.0

1.0.1

emotion: [anger, sadness, surprise, happiness, fear, neutral], speaker: [gender, age], age, gender, sentence

ljspeech

LJSpeech consists of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each c…

CC0-1.0

1.0.0

speaker: [gender], gender, normalized-transcription, transcription

mesd

MESD (Mexican Emotional Speech Database) contains single-word utterances for different emotions like anger, disgust, fear, happiness, neutral, and …

CC-BY-4.0

1.0.1

emotion: [anger, disgust, fear, happiness, neutral, sadness], gender, word, word_corpus

micirp

The Microphone Impulse Response Project (MicIRP) contains impulse response data for vintage microphones. The impulse response files were created us…

CC-BY-SA-4.0

1.0.0

manufacturer

musan

The goal of this corpus is to provide data for music/speech discrimination, speech/nonspeech detection, and voice activity detection. The corpus is…

CC-BY-4.0

1.0.0

artist, background_noise, composer, gender, genre, language, vocals

nemo

NEMO is a polish dataset with emotional speech. It contains over 3 hours of emotional speech in 6 categories: anger, fear, happiness, sadness, surp…

CC-BY-NC-SA-4.0

1.0.1

emotion: [anger, sadness, surprise, happiness, fear, neutral], speaker: [gender, age], age, gender, normalized_text, raw_text, sentence

openair

Impulse Responses (IR) and Reverberation Time (RT60) for different rooms. RT is given in seconds at 500Hz. Openair also has BRIRs and RT60s for 31….

CC-BY-4.0

1.0.0

reverberation-time

quechua

Quechua contains 12420 recordings of emotional speech in Quechua Collao. Six actors were asked to read words and sentences with nine emotional cate…

CC-BY-4.0

1.0.2

emotion: [anger, boredom, happiness, sleepiness, sadness, calmness, fear, exitement, neutral], speaker: [gender], arousal, arousal.agreement, arousal.original, dominance, …, is_single_word, sentence, transcription, translation, valence, valence.agreement, valence.original

ravdess

The Ryerson Audio-Visual Database of Emotional Speech and Song The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 73…

CC-BY-NC-SA-4.0

1.1.3

emotion: [anger, calm, disgust, fear, happiness, neutral, sadness, surprise], speaker: [gender, language], emotional intensity, transcription, vocal channel

ravdess-videos

The Ryerson Audio-Visual Database of Emotional Speech and Song The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 73…

CC-BY-NC-SA-4.0

1.0.3

emotion: [neutral, calm, happy, sad, angry, fearful, disgust, suprised], speaker: [gender, language], emotional intensity, transcription, vocal channel

speech-accent-archive

This dataset contains 2138 speech samples, each from a different talker reading the same reading passage. Talkers come from 177 countries and have …

CC-BY-NC-SA-4.0

2.2.0

speaker, age, age_onset, birthplace, content, country, native_language, sex, tone

subesco

SUBESCO is an audio-only emotional speech corpus of 7000 sentence-level utterances of the Bangla language. The corpus contains 7:40:40h of audio an…

CC-BY-4.0

1.0.0

emotion: [anger, disgust, fear, happiness, neutral, sadness, surprise], speaker: [gender], gender, sentence_number, speaker_name, take_number

urbansound8k

The UrbanSound8k dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes. All excerpts are taken from field recordings …

CC-BY-NC-3.0

1.0.0

category, clip_id, fold, salience

vadtoolkit

VAD Toolkit: A Database for Voice Activity Detection At each environment, conversational speech by two Korean male speakers was recorded. The groun…

GPLv3

1.1.0

noise

wham

The noise audio was collected at various urban locations throughout the San Francisco Bay Area in late 2018. The environments primarily consist of …

CC-BY-NC-4.0

1.0.0

day, file-id, l-to-r-width, location, noise-band, reverberation