Correctness consistency¶

Overall scores¶
	CNN14	w2v2-b	hubert-b	axlstm
Overall Score	46.5% passed tests (20 passed / 23 failed).	51.2% passed tests (22 passed / 21 failed).	46.5% passed tests (20 passed / 23 failed).	51.2% passed tests (22 passed / 21 failed).

Samples In Expected High Range¶

Proportion of samples whose predictions fall into the expected value range of >= 0.55

Threshold: 0.75¶
Data	anger
Data	CNN14	w2v2-b	hubert-b	axlstm
crema-d-1.2.0-emotion.categories.test.gold_standard	0.82	0.84	0.85	0.75
danish-emotional-speech-1.1.1-emotion.test	0.40	0.48	0.29	0.54
emodb-1.2.0-emotion.categories.test.gold_standard	1.00	1.00	1.00	1.00
emovo-1.2.1-emotion.test	0.86	0.98	0.89	0.95
iemocap-2.3.0-emotion.categories.test.gold_standard	0.72	0.88	0.83	0.81
meld-1.3.1-emotion.categories.test.gold_standard	0.87	0.94	0.98	0.86
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.90	0.98	0.95	0.90
ravdess-1.1.2-emotion.speech.test	0.88	1.00	1.00	0.97
mean	0.81	0.89	0.85	0.85

Samples In Expected Low Range¶

Proportion of samples whose predictions fall into the expected value range of <= 0.45

Threshold: 0.75¶
Data	fear				sadness
Data	CNN14	w2v2-b	hubert-b	axlstm	CNN14	w2v2-b	hubert-b	axlstm
crema-d-1.2.0-emotion.categories.test.gold_standard	0.46	0.49	0.54	0.19	0.87	0.84	0.82	0.39
danish-emotional-speech-1.1.1-emotion.test					1.00	0.98	1.00	0.87
emodb-1.2.0-emotion.categories.test.gold_standard	0.00	0.03	0.03	0.00	0.67	1.00	1.00	0.81
emovo-1.2.1-emotion.test	0.42	0.18	0.38	0.21	0.88	0.82	0.94	0.57
iemocap-2.3.0-emotion.categories.test.gold_standard	0.35	0.18	0.24	0.12	0.41	0.81	0.86	0.59
meld-1.3.1-emotion.categories.test.gold_standard	0.10	0.02	0.06	0.08	0.20	0.22	0.21	0.17
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.35	0.32	0.38	0.42	0.95	1.00	0.98	0.90
ravdess-1.1.2-emotion.speech.test	0.34	0.22	0.25	0.03	0.88	0.88	0.81	0.41
mean	0.29	0.21	0.27	0.15	0.73	0.82	0.83	0.59

Samples In Expected Neutral Range¶

Proportion of samples whose predictions fall into the expected value range of [0.3, 0.6]

Threshold: 0.75¶
Data	happiness				neutral				surprise
Data	CNN14	w2v2-b	hubert-b	axlstm	CNN14	w2v2-b	hubert-b	axlstm	CNN14	w2v2-b	hubert-b	axlstm
crema-d-1.2.0-emotion.categories.test.gold_standard	0.73	0.74	0.75	0.89	0.54	0.66	0.65	0.97
danish-emotional-speech-1.1.1-emotion.test	0.98	0.87	0.81	0.94	0.65	0.56	0.44	0.81	1.00	0.96	0.85	0.98
emodb-1.2.0-emotion.categories.test.gold_standard	0.37	0.26	0.19	0.56	1.00	1.00	0.96	0.96
emovo-1.2.1-emotion.test	0.77	0.58	0.63	0.64	0.70	0.96	0.89	0.90	0.76	0.73	0.71	0.83
iemocap-2.3.0-emotion.categories.test.gold_standard	0.85	0.78	0.75	0.95	0.84	0.84	0.81	0.98
meld-1.3.1-emotion.categories.test.gold_standard	0.70	0.55	0.38	0.75	0.81	0.66	0.54	0.83	0.67	0.54	0.39	0.72
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.52	0.18	0.35	0.57	1.00	1.00	1.00	0.92
ravdess-1.1.2-emotion.speech.test	0.72	0.62	0.56	0.56	0.56	1.00	0.50	1.00	0.41	0.06	0.03	0.75
mean	0.71	0.57	0.55	0.73	0.76	0.83	0.72	0.92	0.71	0.57	0.49	0.82

Visualization¶

Distribution of dimensional model predictions for samples with different categorical emotions. The expected range of model predictions is highlighted by the green brackground.

CNN14	w2v2-b	hubert-b	axlstm