Correctness consistency¶

Overall scores¶
	w2v2-L	hubert-L	wavlm	data2vec
Overall Score	67.6% passed tests (25 passed / 12 failed).	73.0% passed tests (27 passed / 10 failed).	59.5% passed tests (22 passed / 15 failed).	64.9% passed tests (24 passed / 13 failed).

Samples In Expected High Range¶

Proportion of samples whose predictions fall into the expected value range of >= 0.55

Threshold: 0.75¶
Data	anger				fear				surprise
Data	w2v2-L	hubert-L	wavlm	data2vec	w2v2-L	hubert-L	wavlm	data2vec	w2v2-L	hubert-L	wavlm	data2vec
crema-d-1.2.0-emotion.categories.test.gold_standard	0.81	0.76	0.80	0.79	0.42	0.37	0.39	0.39
danish-emotional-speech-1.1.1-emotion.test	0.42	0.79	0.42	0.71					0.52	0.67	0.13	0.62
emodb-1.2.0-emotion.categories.test.gold_standard	1.00	1.00	1.00	1.00	0.91	0.97	1.00	0.94
emovo-1.2.1-emotion.test	0.94	0.98	0.88	0.96	0.56	0.64	0.38	0.56	0.69	0.67	0.49	0.62
iemocap-2.3.0-emotion.categories.test.gold_standard	0.87	0.87	0.74	0.87	0.76	0.53	0.35	0.59
meld-1.3.1-emotion.categories.test.gold_standard	0.96	0.97	0.90	0.95	0.92	0.86	0.80	0.84	0.86	0.88	0.71	0.86
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.92	0.98	0.92	0.95	0.42	0.38	0.40	0.35
ravdess-1.1.2-emotion.speech.test	0.94	0.91	0.81	0.84	0.62	0.66	0.47	0.53	1.00	0.97	0.94	1.00
mean	0.86	0.91	0.81	0.88	0.66	0.63	0.54	0.60	0.77	0.80	0.57	0.77

Samples In Expected Low Range¶

Proportion of samples whose predictions fall into the expected value range of <= 0.45

Threshold: 0.75¶
Data	boredom				sadness
Data	w2v2-L	hubert-L	wavlm	data2vec	w2v2-L	hubert-L	wavlm	data2vec
crema-d-1.2.0-emotion.categories.test.gold_standard					0.85	0.95	0.95	0.85
danish-emotional-speech-1.1.1-emotion.test					0.98	0.88	1.00	0.92
emodb-1.2.0-emotion.categories.test.gold_standard	0.89	0.97	0.81	0.89	1.00	1.00	1.00	1.00
emovo-1.2.1-emotion.test					0.87	0.90	1.00	0.88
iemocap-2.3.0-emotion.categories.test.gold_standard					0.83	0.81	0.91	0.78
meld-1.3.1-emotion.categories.test.gold_standard					0.20	0.25	0.35	0.20
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.95	1.00	0.98	0.88	1.00	1.00	1.00	0.95
ravdess-1.1.2-emotion.speech.test					0.78	0.81	0.88	0.91
mean	0.92	0.98	0.90	0.89	0.81	0.82	0.89	0.81

Samples In Expected Neutral Range¶

Proportion of samples whose predictions fall into the expected value range of [0.3, 0.6]

Threshold: 0.75¶
Data	neutral
Data	w2v2-L	hubert-L	wavlm	data2vec
crema-d-1.2.0-emotion.categories.test.gold_standard	0.51	0.62	0.45	0.43
danish-emotional-speech-1.1.1-emotion.test	0.31	0.90	0.54	0.69
emodb-1.2.0-emotion.categories.test.gold_standard	0.96	0.93	1.00	0.85
emovo-1.2.1-emotion.test	0.86	0.87	0.77	0.94
iemocap-2.3.0-emotion.categories.test.gold_standard	0.83	0.88	0.64	0.89
meld-1.3.1-emotion.categories.test.gold_standard	0.51	0.67	0.79	0.54
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.90	1.00	0.82	1.00
ravdess-1.1.2-emotion.speech.test	0.56	0.88	0.44	0.31
mean	0.68	0.84	0.68	0.71

Visualization¶

Distribution of dimensional model predictions for samples with different categorical emotions. The expected range of model predictions is highlighted by the green brackground.

w2v2-L	hubert-L	wavlm	data2vec