clac

version

1.1.0

license

CC-BY-SA-4.0

usage

commercial

languages

eng

format

wav

channel

1

sampling rate

8000, 16000, 44100, 48000

bit depth

16

duration

4 days 08:45:53.120362812

files

18609, duration distribution: 3.2 s clac-1.1.0-file-duration-distribution 183.7 s

segments

25214, duration distribution: 0.8 s clac-1.1.0-segment-duration-distribution 7.0 s

repository

audb-public

Description

The Crowdsourced Language Assessment Corpus (CLAC) consists of audio recordings and automatically-generated transcripts from 1,832 speakers for several speech and language tasks, as well as metadata for each of the speakers. The speaker metadata contains information about the age, gender, years of eduction, residence of each speaker and whether they had health-related symptoms during the recordings.

Example

audio/cookie_theft/spk1383.wav

../_images/clac-1.1.0-player-waveform.png

Tables

Click on a row to toggle a preview.

ID

Type

Columns

age.dev

segmented

speaker

file

start

end

speaker

audio/grandfather/spk1537.wav

0 days 00:00:01.280000

0 days 00:00:03.740000

1537

audio/grandfather/spk1537.wav

0 days 00:00:04.140000

0 days 00:00:09.080000

1537

audio/grandfather/spk1537.wav

0 days 00:00:09.720000

0 days 00:00:12.460000

1537

audio/grandfather/spk1537.wav

0 days 00:00:12.980000

0 days 00:00:15.400000

1537

audio/grandfather/spk1537.wav

0 days 00:00:16.100000

0 days 00:00:17.180000

1537

2605 rows x 1 column

age.test

segmented

speaker

file

start

end

speaker

audio/grandfather/spk809.wav

0 days 00:00:01.760000

0 days 00:00:07.720000

809

audio/grandfather/spk809.wav

0 days 00:00:08.160000

0 days 00:00:11.540000

809

audio/grandfather/spk809.wav

0 days 00:00:12.200000

0 days 00:00:15.020000

809

audio/grandfather/spk809.wav

0 days 00:00:16.460000

0 days 00:00:18.280000

809

audio/grandfather/spk809.wav

0 days 00:00:19.100000

0 days 00:00:21.340000

809

2510 rows x 1 column

age.train

segmented

speaker

file

start

end

speaker

audio/grandfather/spk386.wav

0 days 00:00:00.960000

0 days 00:00:03.140000

386

audio/grandfather/spk386.wav

0 days 00:00:04.060000

0 days 00:00:08.460000

386

audio/grandfather/spk386.wav

0 days 00:00:09.200000

0 days 00:00:11.680000

386

audio/grandfather/spk386.wav

0 days 00:00:12.200000

0 days 00:00:14.140000

386

audio/grandfather/spk386.wav

0 days 00:00:14.840000

0 days 00:00:16.620000

386

20099 rows x 1 column

files

filewise

task-name, transcript, speaker

file

task-name

transcript

speaker

audio/counting_1_to_20/spk386.wav

counting_1_to_20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

386

audio/counting_1_to_20/spk107.wav

counting_1_to_20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

107

audio/counting_1_to_20/spk1537.wav

counting_1_to_20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1537

audio/counting_1_to_20/spk727.wav

counting_1_to_20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

727

audio/counting_1_to_20/spk809.wav

counting_1_to_20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

809

18609 rows x 3 columns

speaker-metadata

misc

age, gender, country, region, city, education(years), symptoms

speaker

age

gender

country

region

city

education(years)

symptoms

1

26

male

United States (US)

Maryland (MD)

Waldorf

False

2

32

male

United States (US)

North Carolina (NC)

Durham

16

False

3

44

male

United States (US)

California (CA)

North Hills

16

False

4

59

female

United States (US)

None (None)

16

False

5

22

male

United States (US)

Oregon (OR)

Eugene

15

False

1832 rows x 7 columns

Schemes

ID

Dtype

Min

Max

Labels

Mappings

age

int

0

100

city

str

country

str

Bangladesh (BD), Hong Kong (HK), India (IN), Italy (IT), Mexico (MX), None (None), Pakistan (PK), Philippines (PH), Romania (RO), Switzerland (CH), United Kingdom (GB), United States (US), Venezuela (VE)

education(years)

int

gender

str

female, male, other

region

str

speaker

int

1, 2, 3, 4, 5, 6, 7, […], 1825, 1826, 1827, 1828, 1829, 1830, 1831, 1832

age, city, country, education(years), gender, region, symptoms

symptoms

bool

task-name

str

cookie_theft, counting_1_to_20, days_of_the_week, grandfather, max_phonation, picnic, rainbow, repeat_5_times, repeat_5_times_artillery, repeat_5_times_catastrophe, repeat_5_times_impossibility, smr

transcript

str