clac¶

version	1.1.0
license	CC-BY-SA-4.0
usage	commercial
languages	eng
format	wav
channel	1
sampling rate	8000, 16000, 44100, 48000
bit depth	16
duration	4 days 08:45:53.120362812
files	18609, duration distribution: 3.2 s 183.7 s
segments	25214, duration distribution: 0.8 s 7.0 s
repository	audb-public

Description¶

The Crowdsourced Language Assessment Corpus (CLAC) consists of audio recordings and automatically-generated transcripts from 1,832 speakers for several speech and language tasks, as well as metadata for each of the speakers. The speaker metadata contains information about the age, gender, years of eduction, residence of each speaker and whether they had health-related symptoms during the recordings.

Example¶

audio/cookie_theft/spk1383.wav

../_images/clac-1.1.0-player-waveform.png

Tables¶

Click on a row to toggle a preview.

age.dev

segmented

speaker

file	start	end	speaker
audio/grandfather/spk1537.wav	0 days 00:00:01.280000	0 days 00:00:03.740000	1537
audio/grandfather/spk1537.wav	0 days 00:00:04.140000	0 days 00:00:09.080000	1537
audio/grandfather/spk1537.wav	0 days 00:00:09.720000	0 days 00:00:12.460000	1537
audio/grandfather/spk1537.wav	0 days 00:00:12.980000	0 days 00:00:15.400000	1537
audio/grandfather/spk1537.wav	0 days 00:00:16.100000	0 days 00:00:17.180000	1537
2605 rows x 1 column

age.test

segmented

speaker

file	start	end	speaker
audio/grandfather/spk809.wav	0 days 00:00:01.760000	0 days 00:00:07.720000	809
audio/grandfather/spk809.wav	0 days 00:00:08.160000	0 days 00:00:11.540000	809
audio/grandfather/spk809.wav	0 days 00:00:12.200000	0 days 00:00:15.020000	809
audio/grandfather/spk809.wav	0 days 00:00:16.460000	0 days 00:00:18.280000	809
audio/grandfather/spk809.wav	0 days 00:00:19.100000	0 days 00:00:21.340000	809
2510 rows x 1 column

age.train

segmented

speaker

file	start	end	speaker
audio/grandfather/spk386.wav	0 days 00:00:00.960000	0 days 00:00:03.140000	386
audio/grandfather/spk386.wav	0 days 00:00:04.060000	0 days 00:00:08.460000	386
audio/grandfather/spk386.wav	0 days 00:00:09.200000	0 days 00:00:11.680000	386
audio/grandfather/spk386.wav	0 days 00:00:12.200000	0 days 00:00:14.140000	386
audio/grandfather/spk386.wav	0 days 00:00:14.840000	0 days 00:00:16.620000	386
20099 rows x 1 column

files

filewise

task-name, transcript, speaker

file	task-name	transcript	speaker
audio/counting_1_to_20/spk386.wav	counting_1_to_20	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20	386
audio/counting_1_to_20/spk107.wav	counting_1_to_20	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20	107
audio/counting_1_to_20/spk1537.wav	counting_1_to_20	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20	1537
audio/counting_1_to_20/spk727.wav	counting_1_to_20	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20	727
audio/counting_1_to_20/spk809.wav	counting_1_to_20	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20	809
18609 rows x 3 columns

speaker-metadata

misc

age, gender, country, region, city, education(years), symptoms

speaker	age	gender	country	region	city	education(years)	symptoms
1	26	male	United States (US)	Maryland (MD)	Waldorf		False
2	32	male	United States (US)	North Carolina (NC)	Durham	16	False
3	44	male	United States (US)	California (CA)	North Hills	16	False
4	59	female	United States (US)	None (None)		16	False
5	22	male	United States (US)	Oregon (OR)	Eugene	15	False
1832 rows x 7 columns

Schemes¶

ID	Dtype	Min	Max	Labels	Mappings
age	int	0	100
city	str
country	str			Bangladesh (BD), Hong Kong (HK), India (IN), Italy (IT), Mexico (MX), None (None), Pakistan (PK), Philippines (PH), Romania (RO), Switzerland (CH), United Kingdom (GB), United States (US), Venezuela (VE)
education(years)	int
gender	str			female, male, other
region	str
speaker	int			1, 2, 3, 4, 5, 6, 7, […], 1825, 1826, 1827, 1828, 1829, 1830, 1831, 1832	age, city, country, education(years), gender, region, symptoms
symptoms	bool
task-name	str			cookie_theft, counting_1_to_20, days_of_the_week, grandfather, max_phonation, picnic, rainbow, repeat_5_times, repeat_5_times_artillery, repeat_5_times_catastrophe, repeat_5_times_impossibility, smr
transcript	str