Correctness consistency¶

Overall scores¶
	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
Overall Score	56.8% passed tests (21 passed / 16 failed).	67.6% passed tests (25 passed / 12 failed).	64.9% passed tests (24 passed / 13 failed).	67.6% passed tests (25 passed / 12 failed).	62.2% passed tests (23 passed / 14 failed).

Samples In Expected High Range¶

Proportion of samples whose predictions fall into the expected value range of >= 0.55

Threshold: 0.75¶
Data	anger					fear					surprise
Data	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
crema-d-1.2.0-emotion.categories.test.gold_standard	0.75	0.81	0.82	0.76	0.79	0.40	0.42	0.41	0.41	0.39
danish-emotional-speech-1.1.1-emotion.test	0.48	0.42	0.46	0.42	0.40						0.54	0.52	0.37	0.21	0.37
emodb-1.2.0-emotion.categories.test.gold_standard	1.00	1.00	1.00	1.00	1.00	0.88	0.91	0.94	0.91	0.82
emovo-1.2.1-emotion.test	0.90	0.94	0.96	0.93	0.92	0.54	0.56	0.55	0.54	0.44	0.67	0.69	0.60	0.60	0.57
iemocap-2.3.0-emotion.categories.test.gold_standard	0.83	0.87	0.87	0.87	0.84	0.65	0.76	0.59	0.82	0.71
meld-1.3.1-emotion.categories.test.gold_standard	0.96	0.96	0.96	0.95	0.96	0.90	0.92	0.86	0.90	0.90	0.85	0.86	0.81	0.84	0.86
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.92	0.92	0.95	0.92	0.92	0.45	0.42	0.35	0.42	0.40
ravdess-1.1.2-emotion.speech.test	0.88	0.94	0.91	0.91	0.91	0.50	0.62	0.50	0.62	0.56	1.00	1.00	0.97	0.97	1.00
mean	0.84	0.86	0.87	0.84	0.84	0.62	0.66	0.60	0.66	0.60	0.77	0.77	0.69	0.66	0.70

Samples In Expected Low Range¶

Proportion of samples whose predictions fall into the expected value range of <= 0.45

Threshold: 0.75¶
Data	boredom					sadness
Data	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
crema-d-1.2.0-emotion.categories.test.gold_standard						0.89	0.85	0.85	0.90	0.90
danish-emotional-speech-1.1.1-emotion.test						1.00	0.98	1.00	1.00	1.00
emodb-1.2.0-emotion.categories.test.gold_standard	0.97	0.89	0.97	0.83	0.94	1.00	1.00	1.00	1.00	1.00
emovo-1.2.1-emotion.test						0.90	0.87	0.88	0.92	0.94
iemocap-2.3.0-emotion.categories.test.gold_standard						0.86	0.83	0.84	0.84	0.84
meld-1.3.1-emotion.categories.test.gold_standard						0.25	0.20	0.24	0.25	0.29
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.98	0.95	1.00	0.90	1.00	1.00	1.00	1.00	1.00	1.00
ravdess-1.1.2-emotion.speech.test						0.88	0.78	0.84	0.81	0.81
mean	0.97	0.92	0.98	0.86	0.97	0.85	0.81	0.83	0.84	0.85

Samples In Expected Neutral Range¶

Proportion of samples whose predictions fall into the expected value range of [0.3, 0.6]

Threshold: 0.75¶
Data	neutral
Data	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
crema-d-1.2.0-emotion.categories.test.gold_standard	0.39	0.51	0.51	0.44	0.49
danish-emotional-speech-1.1.1-emotion.test	0.27	0.31	0.54	0.23	0.13
emodb-1.2.0-emotion.categories.test.gold_standard	0.52	0.96	0.96	0.89	0.74
emovo-1.2.1-emotion.test	0.81	0.86	0.90	0.86	0.82
iemocap-2.3.0-emotion.categories.test.gold_standard	0.69	0.83	0.81	0.79	0.83
meld-1.3.1-emotion.categories.test.gold_standard	0.48	0.51	0.72	0.55	0.54
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.80	0.90	0.98	0.90	0.88
ravdess-1.1.2-emotion.speech.test	0.00	0.56	0.50	0.50	0.62
mean	0.49	0.68	0.74	0.65	0.63

Visualization¶

Distribution of dimensional model predictions for samples with different categorical emotions. The expected range of model predictions is highlighted by the green brackground.

w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox