expresso

Created by TA Nguyen, W-N Hsu, A D’Avirro, B Shi, I Gat, M Fazel-Zarani, T Remez, Ja Copet, G Synnaeve, M Hassid, F Kreuk, Y Adi, E Dupoux

version

1.0.0

license

CC-BY-NC-4.0

usage

research

languages

eng

format

wav

channel

1, 2

sampling rate

48000

bit depth

24

duration

1 days 18:23:42.549854165

files

11954, duration distribution: 0.6 s expresso-1.0.0-file-duration-distribution 1346.7 s

segments

27545, duration distribution: nan s expresso-1.0.0-segment-duration-distribution nan s

Description

Expresso is a dataset of expressive speech recordings. It contains read speech and singing in various styles including default, confused, enunciated, happy, laughing, narration, sad, singing, and whisper. The dataset is part of the TextlessLib project.

Example

audio_48khz/read/ex02/happy/base/ex02_happy_00339.wav

../_images/expresso-1.0.0-player-waveform.png

Tables

Click on a row to toggle a preview.

ID

Type

Columns

dev.channel0

segmented

speaker, style, corpus

file

start

end

speaker

style

corpus

audio_48khz/read/ex01/default/longform/ex01_default_longform_00001.wav

0 days

0 days 00:00:16.490000

ex01

default

longform

audio_48khz/read/ex01/narration/longform/ex01_narration_longform_00001.wav

0 days

0 days 00:00:16.990000

ex01

narration

longform

audio_48khz/read/ex02/default/longform/ex02_default_longform_00001.wav

0 days

0 days 00:00:14.070000

ex02

default

longform

audio_48khz/read/ex02/narration/longform/ex02_narration_longform_00001.wav

0 days

0 days 00:00:18.590000

ex02

narration

longform

audio_48khz/read/ex03/default/longform/ex03_default_longform_00001.wav

0 days

0 days 00:00:13.390000

ex03

default

longform

688 rows x 3 columns

dev.channel1

segmented

speaker, style, corpus

file

start

end

speaker

style

corpus

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_007.wav

0 days

0 days 00:01:00

ex02

default

audio_48khz/conversational/ex01-ex02/enunciated/ex01-ex02_enunciated_001.wav

0 days

0 days 00:00:50.270000

ex02

enunciated

audio_48khz/conversational/ex01-ex02/fast/ex01-ex02_fast_004.wav

0 days

0 days 00:01:00

ex02

fast

audio_48khz/conversational/ex01-ex02/projected/ex01-ex02_projected_006.wav

0 days

0 days 00:01:00

ex02

projected

audio_48khz/conversational/ex01-ex02/whisper/ex01-ex02_whisper_001.wav

0 days

0 days 00:01:00

ex02

whisper

52 rows x 3 columns

files

filewise

id, speaking_style

file

id

speaking_style

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav

001

conversational

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_002.wav

002

conversational

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_003.wav

003

conversational

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_004.wav

004

conversational

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_005.wav

005

conversational

11954 rows x 2 columns

test.channel0

segmented

speaker, style, corpus

file

start

end

speaker

style

corpus

audio_48khz/read/ex01/default/longform/ex01_default_longform_00001.wav

0 days 00:00:16.490000

0 days 00:00:32.980000

ex01

default

longform

audio_48khz/read/ex01/narration/longform/ex01_narration_longform_00001.wav

0 days 00:00:16.990000

0 days 00:00:33.980000

ex01

narration

longform

audio_48khz/read/ex02/default/longform/ex02_default_longform_00001.wav

0 days 00:00:14.070000

0 days 00:00:28.140000

ex02

default

longform

audio_48khz/read/ex02/narration/longform/ex02_narration_longform_00001.wav

0 days 00:00:18.590000

0 days 00:00:37.180000

ex02

narration

longform

audio_48khz/read/ex03/default/longform/ex03_default_longform_00001.wav

0 days 00:00:13.390000

0 days 00:00:26.780000

ex03

default

longform

648 rows x 3 columns

test.channel1

segmented

speaker, style, corpus

file

start

end

speaker

style

corpus

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_008.wav

0 days 00:00:00

0 days 00:01:00

ex02

default

audio_48khz/conversational/ex01-ex02/enunciated/ex01-ex02_enunciated_001.wav

0 days 00:00:50.270000

0 days 00:01:40.540000

ex02

enunciated

audio_48khz/conversational/ex01-ex02/fast/ex01-ex02_fast_003.wav

0 days 00:00:00

0 days 00:01:00

ex02

fast

audio_48khz/conversational/ex01-ex02/projected/ex01-ex02_projected_005.wav

0 days 00:00:00

0 days 00:01:00

ex02

projected

audio_48khz/conversational/ex01-ex02/whisper/ex01-ex02_whisper_004.wav

0 days 00:00:00

0 days 00:01:00

ex02

whisper

52 rows x 3 columns

train.channel0

segmented

speaker, style, corpus

file

start

end

speaker

style

corpus

audio_48khz/read/ex01/default/longform/ex01_default_longform_00001.wav

0 days 00:00:32.980000

ex01

default

longform

audio_48khz/read/ex01/narration/longform/ex01_narration_longform_00001.wav

0 days 00:00:33.980000

ex01

narration

longform

audio_48khz/read/ex02/default/longform/ex02_default_longform_00001.wav

0 days 00:00:28.140000

ex02

default

longform

audio_48khz/read/ex02/narration/longform/ex02_narration_longform_00001.wav

0 days 00:00:37.180000

ex02

narration

longform

audio_48khz/read/ex03/default/longform/ex03_default_longform_00001.wav

0 days 00:00:26.780000

ex03

default

longform

10727 rows x 3 columns

train.channel1

segmented

speaker, style, corpus

file

start

end

speaker

style

corpus

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav

0 days

ex02

default

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_002.wav

0 days

ex02

default

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_003.wav

0 days

ex02

default

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_004.wav

0 days

ex02

default

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_005.wav

0 days

ex02

default

339 rows x 3 columns

vad.channel0

segmented

channel

file

start

end

channel

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav

0 days 00:00:23.880000

0 days 00:00:28.140000

0

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav

0 days 00:00:55.120000

0 days 00:01:01.830000

0

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav

0 days 00:01:21.760000

0 days 00:01:34.580000

0

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav

0 days 00:01:36.140000

0 days 00:01:39.490000

0

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav

0 days 00:02:10.750000

0 days 00:02:21.900000

0

7790 rows x 1 column

vad.channel1

segmented

channel

file

start

end

channel

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav

0 days 00:00:00

0 days 00:00:23.760000

1

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav

0 days 00:00:28.560000

0 days 00:00:55.040000

1

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav

0 days 00:01:01.930000

0 days 00:01:21.330000

1

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav

0 days 00:01:38.890000

0 days 00:01:50.970000

1

audio_48khz/conversational/ex01-ex02/default/ex01-ex02_default_001.wav

0 days 00:01:51.880000

0 days 00:02:06.950000

1

7694 rows x 1 column

Schemes

ID

Dtype

Labels

channel

int

0, 1

corpus

str

base, longform

id

str

00001, 00002, 00003, 00004, 00005, 00006, 00007, […], 014, 015, 016, 017, 018, 019, 020, 021

speaker

str

ex01, ex02, ex03, ex04

speaking_style

str

conversational, read

style

str

angry, animal, animal_directed, awe, bored, calm, child, […], narration, non_verbal, projected, sad, sarcastic, sleepy, sympathetic, whisper