Correctness consistency¶

Overall scores¶
	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
Overall Score	51.2% passed tests (22 passed / 21 failed).	55.8% passed tests (24 passed / 19 failed).	51.2% passed tests (22 passed / 21 failed).	53.5% passed tests (23 passed / 20 failed).	51.2% passed tests (22 passed / 21 failed).

Samples In Expected High Range¶

Proportion of samples whose predictions fall into the expected value range of >= 0.55

Threshold: 0.75¶
Data	anger
Data	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
crema-d-1.2.0-emotion.categories.test.gold_standard	0.84	0.81	0.94	0.83	0.87
danish-emotional-speech-1.1.1-emotion.test	0.48	0.42	0.63	0.52	0.48
emodb-1.2.0-emotion.categories.test.gold_standard	1.00	1.00	1.00	1.00	1.00
emovo-1.2.1-emotion.test	0.98	0.98	1.00	0.96	0.99
iemocap-2.3.0-emotion.categories.test.gold_standard	0.88	0.88	0.94	0.92	0.90
meld-1.3.1-emotion.categories.test.gold_standard	0.94	0.97	0.98	0.96	0.96
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.98	0.95	1.00	0.98	0.95
ravdess-1.1.2-emotion.speech.test	1.00	0.94	1.00	0.97	1.00
mean	0.89	0.87	0.94	0.89	0.89

Samples In Expected Low Range¶

Proportion of samples whose predictions fall into the expected value range of <= 0.45

Threshold: 0.75¶
Data	fear					sadness
Data	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
crema-d-1.2.0-emotion.categories.test.gold_standard	0.49	0.49	0.40	0.48	0.46	0.84	0.87	0.80	0.89	0.82
danish-emotional-speech-1.1.1-emotion.test						0.98	0.98	0.98	1.00	0.98
emodb-1.2.0-emotion.categories.test.gold_standard	0.03	0.00	0.03	0.00	0.03	1.00	1.00	1.00	1.00	1.00
emovo-1.2.1-emotion.test	0.18	0.21	0.07	0.23	0.15	0.82	0.80	0.56	0.75	0.77
iemocap-2.3.0-emotion.categories.test.gold_standard	0.18	0.18	0.12	0.12	0.00	0.81	0.81	0.70	0.76	0.70
meld-1.3.1-emotion.categories.test.gold_standard	0.02	0.02	0.06	0.02	0.02	0.22	0.17	0.17	0.19	0.22
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.32	0.38	0.38	0.30	0.35	1.00	1.00	0.92	0.95	0.98
ravdess-1.1.2-emotion.speech.test	0.22	0.16	0.06	0.16	0.09	0.88	0.78	0.75	0.78	0.81
mean	0.21	0.21	0.16	0.19	0.16	0.82	0.80	0.73	0.79	0.78

Samples In Expected Neutral Range¶

Proportion of samples whose predictions fall into the expected value range of [0.3, 0.6]

Threshold: 0.75¶
Data	happiness					neutral					surprise
Data	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
crema-d-1.2.0-emotion.categories.test.gold_standard	0.74	0.85	0.77	0.81	0.74	0.66	0.72	0.90	0.69	0.83
danish-emotional-speech-1.1.1-emotion.test	0.87	0.96	0.98	1.00	0.90	0.56	0.90	1.00	0.87	0.65	0.96	0.98	0.98	1.00	0.94
emodb-1.2.0-emotion.categories.test.gold_standard	0.26	0.26	0.22	0.15	0.22	1.00	1.00	1.00	1.00	1.00
emovo-1.2.1-emotion.test	0.58	0.50	0.52	0.49	0.46	0.96	0.99	0.98	0.98	0.98	0.73	0.64	0.64	0.67	0.62
iemocap-2.3.0-emotion.categories.test.gold_standard	0.78	0.85	0.85	0.86	0.94	0.84	0.92	0.94	0.93	0.96
meld-1.3.1-emotion.categories.test.gold_standard	0.55	0.50	0.55	0.49	0.44	0.66	0.63	0.72	0.66	0.60	0.54	0.47	0.51	0.46	0.46
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.18	0.20	0.18	0.12	0.25	1.00	1.00	1.00	1.00	1.00
ravdess-1.1.2-emotion.speech.test	0.62	0.62	0.72	0.66	0.53	1.00	1.00	1.00	1.00	1.00	0.06	0.09	0.38	0.19	0.03
mean	0.57	0.59	0.60	0.57	0.56	0.83	0.90	0.94	0.89	0.88	0.57	0.55	0.63	0.58	0.51

Visualization¶

Distribution of dimensional model predictions for samples with different categorical emotions. The expected range of model predictions is highlighted by the green brackground.

w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox