expresso
Created by TA Nguyen, W-N Hsu, A D’Avirro, B Shi, I Gat, M Fazel-Zarani, T Remez, Ja Copet, G Synnaeve, M Hassid, F Kreuk, Y Adi, E Dupoux
version
1.0.0
license
CC-BY-NC-4.0
usage
research
languages
eng
format
wav
channel
1, 2
sampling rate
48000
bit depth
24
duration
1 days 18:23:42.549854165
files
11954, duration distribution: 0.6 s 1346.7 s
segments
27545, duration distribution: nan s nan s
Description
Expresso is a dataset of expressive speech recordings. It contains read speech and singing in various styles including default, confused, enunciated, happy, laughing, narration, sad, singing, and whisper. The dataset is part of the TextlessLib project.
Example
audio_48khz/read/ex02/happy/base/ex02_happy_00339.wav
Tables
Click on a row to toggle a preview.
dev.channel0
segmented
speaker, style, corpus
file
start
end
speaker
style
corpus
audio_48khz/read/ex01/default/longform/ex01_default_longform_00001.wav
0 days
0 days 00:00:16.490000
ex01
default
longform
audio_48khz/read/ex01/narration/longform/ex01_narration_longform_00001.wav
0 days
0 days 00:00:16.990000
ex01
narration
longform
audio_48khz/read/ex02/default/longform/ex02_default_longform_00001.wav
0 days
0 days 00:00:14.070000
ex02
default
longform
audio_48khz/read/ex02/narration/longform/ex02_narration_longform_00001.wav
0 days
0 days 00:00:18.590000
ex02
narration
longform
audio_48khz/read/ex03/default/longform/ex03_default_longform_00001.wav
0 days
0 days 00:00:13.390000
ex03
default
longform
688 rows x 3 columns
dev.channel1
segmented
speaker, style, corpus
file
start
end
speaker
style
corpus
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_007.wav
0 days
0 days 00:01:00
ex02
default
audio_48khz/conversational/ex01-ex02/enunciated/ex01-ex02_enunciated_001.wav
0 days
0 days 00:00:50.270000
ex02
enunciated
audio_48khz/conversational/ex01-ex02/fast/ex01-ex02_fast_004.wav
0 days
0 days 00:01:00
ex02
fast
audio_48khz/conversational/ex01-ex02/projected/ex01-ex02_projected_006.wav
0 days
0 days 00:01:00
ex02
projected
audio_48khz/conversational/ex01-ex02/whisper/ex01-ex02_whisper_001.wav
0 days
0 days 00:01:00
ex02
whisper
52 rows x 3 columns
files
filewise
id, speaking_style
file
id
speaking_style
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav
001
conversational
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_002.wav
002
conversational
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_003.wav
003
conversational
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_004.wav
004
conversational
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_005.wav
005
conversational
11954 rows x 2 columns
test.channel0
segmented
speaker, style, corpus
file
start
end
speaker
style
corpus
audio_48khz/read/ex01/default/longform/ex01_default_longform_00001.wav
0 days 00:00:16.490000
0 days 00:00:32.980000
ex01
default
longform
audio_48khz/read/ex01/narration/longform/ex01_narration_longform_00001.wav
0 days 00:00:16.990000
0 days 00:00:33.980000
ex01
narration
longform
audio_48khz/read/ex02/default/longform/ex02_default_longform_00001.wav
0 days 00:00:14.070000
0 days 00:00:28.140000
ex02
default
longform
audio_48khz/read/ex02/narration/longform/ex02_narration_longform_00001.wav
0 days 00:00:18.590000
0 days 00:00:37.180000
ex02
narration
longform
audio_48khz/read/ex03/default/longform/ex03_default_longform_00001.wav
0 days 00:00:13.390000
0 days 00:00:26.780000
ex03
default
longform
648 rows x 3 columns
test.channel1
segmented
speaker, style, corpus
file
start
end
speaker
style
corpus
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_008.wav
0 days 00:00:00
0 days 00:01:00
ex02
default
audio_48khz/conversational/ex01-ex02/enunciated/ex01-ex02_enunciated_001.wav
0 days 00:00:50.270000
0 days 00:01:40.540000
ex02
enunciated
audio_48khz/conversational/ex01-ex02/fast/ex01-ex02_fast_003.wav
0 days 00:00:00
0 days 00:01:00
ex02
fast
audio_48khz/conversational/ex01-ex02/projected/ex01-ex02_projected_005.wav
0 days 00:00:00
0 days 00:01:00
ex02
projected
audio_48khz/conversational/ex01-ex02/whisper/ex01-ex02_whisper_004.wav
0 days 00:00:00
0 days 00:01:00
ex02
whisper
52 rows x 3 columns
train.channel0
segmented
speaker, style, corpus
file
start
end
speaker
style
corpus
audio_48khz/read/ex01/default/longform/ex01_default_longform_00001.wav
0 days 00:00:32.980000
ex01
default
longform
audio_48khz/read/ex01/narration/longform/ex01_narration_longform_00001.wav
0 days 00:00:33.980000
ex01
narration
longform
audio_48khz/read/ex02/default/longform/ex02_default_longform_00001.wav
0 days 00:00:28.140000
ex02
default
longform
audio_48khz/read/ex02/narration/longform/ex02_narration_longform_00001.wav
0 days 00:00:37.180000
ex02
narration
longform
audio_48khz/read/ex03/default/longform/ex03_default_longform_00001.wav
0 days 00:00:26.780000
ex03
default
longform
10727 rows x 3 columns
train.channel1
segmented
speaker, style, corpus
file
start
end
speaker
style
corpus
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav
0 days
ex02
default
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_002.wav
0 days
ex02
default
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_003.wav
0 days
ex02
default
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_004.wav
0 days
ex02
default
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_005.wav
0 days
ex02
default
339 rows x 3 columns
vad.channel0
segmented
channel
file
start
end
channel
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav
0 days 00:00:23.880000
0 days 00:00:28.140000
0
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav
0 days 00:00:55.120000
0 days 00:01:01.830000
0
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav
0 days 00:01:21.760000
0 days 00:01:34.580000
0
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav
0 days 00:01:36.140000
0 days 00:01:39.490000
0
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav
0 days 00:02:10.750000
0 days 00:02:21.900000
0
7790 rows x 1 column
vad.channel1
segmented
channel
file
start
end
channel
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav
0 days 00:00:00
0 days 00:00:23.760000
1
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav
0 days 00:00:28.560000
0 days 00:00:55.040000
1
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav
0 days 00:01:01.930000
0 days 00:01:21.330000
1
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav
0 days 00:01:38.890000
0 days 00:01:50.970000
1
audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav
0 days 00:01:51.880000
0 days 00:02:06.950000
1
7694 rows x 1 column
Schemes
ID
Dtype
Labels
channel
int
0, 1
corpus
str
base, longform
id
str
00001, 00002, 00003, 00004, 00005, 00006, 00007, […], 014, 015, 016, 017, 018, 019, 020, 021
speaker
str
ex01, ex02, ex03, ex04
speaking_style
str
conversational, read
style
str
angry, animal, animal_directed, awe, bored, calm, child, […], narration, non_verbal, projected, sad, sarcastic, sleepy, sympathetic, whisper