Correctness consistency¶

Overall scores¶
	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
Overall Score	46.8% passed tests (22 passed / 25 failed).	59.6% passed tests (28 passed / 19 failed).	42.6% passed tests (20 passed / 27 failed).	59.6% passed tests (28 passed / 19 failed).	68.1% passed tests (32 passed / 15 failed).

Samples In Expected High Range¶

Proportion of samples whose predictions fall into the expected value range of >= 0.55

Threshold: 0.75¶
Data	happiness
Data	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
crema-d-1.2.0-emotion.categories.test.gold_standard	0.15	0.10	0.48	0.05	0.18
danish-emotional-speech-1.1.1-emotion.test	0.54	0.04	0.29	0.12	0.12
emodb-1.2.0-emotion.categories.test.gold_standard	0.15	0.19	0.48	0.00	0.04
emovo-1.2.1-emotion.test	0.11	0.05	0.49	0.02	0.01
iemocap-2.3.0-emotion.categories.test.gold_standard	0.52	0.40	0.51	0.38	0.44
meld-1.3.1-emotion.categories.test.gold_standard	0.65	0.68	0.70	0.62	0.58
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.52	0.42	0.80	0.18	0.18
ravdess-1.1.2-emotion.speech.test	0.00	0.00	0.06	0.00	0.00
mean	0.33	0.23	0.48	0.17	0.19

Samples In Expected Low Range¶

Proportion of samples whose predictions fall into the expected value range of <= 0.45

Threshold: 0.75¶
Data	anger					disgust					fear					frustration					sadness
Data	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
crema-d-1.2.0-emotion.categories.test.gold_standard	0.93	0.99	0.77	0.98	0.95	0.88	1.00	0.71	0.99	0.92	0.89	0.99	0.67	0.98	0.96						0.93	1.00	0.72	1.00	0.93
danish-emotional-speech-1.1.1-emotion.test	0.50	0.98	0.58	0.92	0.94																0.75	0.98	0.79	0.92	1.00
emodb-1.2.0-emotion.categories.test.gold_standard	0.84	1.00	0.49	1.00	1.00	0.69	0.96	0.38	1.00	0.92	0.64	0.88	0.27	0.94	0.91						1.00	1.00	0.85	1.00	1.00
emovo-1.2.1-emotion.test	0.85	1.00	0.54	0.99	1.00	0.55	0.94	0.29	0.87	0.88	0.62	0.95	0.39	0.87	0.98						0.74	1.00	0.49	0.86	0.95
iemocap-2.3.0-emotion.categories.test.gold_standard	0.70	0.72	0.74	0.76	0.79						0.65	0.65	0.76	0.41	0.59	0.58	0.58	0.68	0.61	0.68	0.79	0.86	0.86	0.84	0.85
meld-1.3.1-emotion.categories.test.gold_standard	0.42	0.37	0.48	0.45	0.49	0.30	0.22	0.40	0.28	0.42	0.26	0.18	0.24	0.16	0.30						0.41	0.35	0.56	0.34	0.50
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.68	0.85	0.35	0.88	0.92						0.45	0.55	0.28	0.60	0.92						0.82	0.55	0.32	0.52	0.92
ravdess-1.1.2-emotion.speech.test	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.97	1.00	1.00	0.94	0.97	0.94	1.00	1.00						0.97	1.00	0.91	0.97	1.00
mean	0.74	0.86	0.62	0.87	0.89	0.68	0.82	0.55	0.83	0.83	0.64	0.74	0.51	0.71	0.81	0.58	0.58	0.68	0.61	0.68	0.80	0.84	0.69	0.81	0.89

Samples In Expected Neutral Range¶

Proportion of samples whose predictions fall into the expected value range of [0.3, 0.6]

Threshold: 0.75¶
Data	boredom					neutral
Data	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
crema-d-1.2.0-emotion.categories.test.gold_standard						0.88	0.78	0.97	0.72	0.87
danish-emotional-speech-1.1.1-emotion.test						0.96	0.81	0.96	0.90	0.85
emodb-1.2.0-emotion.categories.test.gold_standard	0.97	1.00	1.00	1.00	1.00	0.96	1.00	1.00	1.00	1.00
emovo-1.2.1-emotion.test						0.96	0.92	0.99	0.76	0.86
iemocap-2.3.0-emotion.categories.test.gold_standard						0.86	0.87	0.89	0.90	0.93
meld-1.3.1-emotion.categories.test.gold_standard						0.62	0.72	0.79	0.72	0.78
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	1.00	1.00	1.00	1.00	1.00	0.95	1.00	1.00	1.00	1.00
ravdess-1.1.2-emotion.speech.test						1.00	0.69	1.00	0.62	0.44
mean	0.98	1.00	1.00	1.00	1.00	0.90	0.85	0.95	0.83	0.84

Visualization¶

Distribution of dimensional model predictions for samples with different categorical emotions. The expected range of model predictions is highlighted by the green brackground.

w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox