clac¶
version |
1.1.0 |
license |
|
usage |
commercial |
languages |
eng |
format |
wav |
channel |
1 |
sampling rate |
8000, 16000, 44100, 48000 |
bit depth |
16 |
duration |
4 days 08:45:53.120362812 |
files |
18609, duration distribution: 3.2 s |
segments |
25214, duration distribution: 0.8 s |
repository |
audb-public |
Description¶
The Crowdsourced Language Assessment Corpus (CLAC) consists of audio recordings and automatically-generated transcripts from 1,832 speakers for several speech and language tasks, as well as metadata for each of the speakers. The speaker metadata contains information about the age, gender, years of eduction, residence of each speaker and whether they had health-related symptoms during the recordings.
Tables¶
Click on a row to toggle a preview.
ID |
Type |
Columns |
---|---|---|
age.dev |
segmented |
speaker |
age.test |
segmented |
speaker |
age.train |
segmented |
speaker |
files |
filewise |
task-name, transcript, speaker |
speaker-metadata |
misc |
age, gender, country, region, city, education(years), symptoms |
Schemes¶
ID |
Dtype |
Min |
Max |
Labels |
Mappings |
---|---|---|---|---|---|
age |
int |
0 |
100 |
||
city |
str |
||||
country |
str |
Bangladesh (BD), Hong Kong (HK), India (IN), Italy (IT), Mexico (MX), None (None), Pakistan (PK), Philippines (PH), Romania (RO), Switzerland (CH), United Kingdom (GB), United States (US), Venezuela (VE) |
|||
education(years) |
int |
||||
gender |
str |
female, male, other |
|||
region |
str |
||||
speaker |
int |
1, 2, 3, 4, 5, 6, 7, […], 1825, 1826, 1827, 1828, 1829, 1830, 1831, 1832 |
age, city, country, education(years), gender, region, symptoms |
||
symptoms |
bool |
||||
task-name |
str |
cookie_theft, counting_1_to_20, days_of_the_week, grandfather, max_phonation, picnic, rainbow, repeat_5_times, repeat_5_times_artillery, repeat_5_times_catastrophe, repeat_5_times_impossibility, smr |
|||
transcript |
str |