Correctness consistency¶

Overall scores¶
	CNN14	w2v2-b	hubert-b	axlstm
Overall Score	40.4% passed tests (19 passed / 28 failed).	46.8% passed tests (22 passed / 25 failed).	53.2% passed tests (25 passed / 22 failed).	21.3% passed tests (10 passed / 37 failed).

Samples In Expected High Range¶

Proportion of samples whose predictions fall into the expected value range of >= 0.55

Threshold: 0.75¶
Data	happiness
Data	CNN14	w2v2-b	hubert-b	axlstm
crema-d-1.2.0-emotion.categories.test.gold_standard	0.10	0.15	0.23	0.19
danish-emotional-speech-1.1.1-emotion.test	0.29	0.54	0.23	0.17
emodb-1.2.0-emotion.categories.test.gold_standard	0.00	0.15	0.30	0.37
emovo-1.2.1-emotion.test	0.12	0.11	0.14	0.24
iemocap-2.3.0-emotion.categories.test.gold_standard	0.37	0.52	0.49	0.33
meld-1.3.1-emotion.categories.test.gold_standard	0.52	0.65	0.81	0.46
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.02	0.52	0.72	0.30
ravdess-1.1.2-emotion.speech.test	0.00	0.00	0.00	0.22
mean	0.18	0.33	0.36	0.29

Samples In Expected Low Range¶

Proportion of samples whose predictions fall into the expected value range of <= 0.45

Threshold: 0.75¶
Data	anger				disgust				fear				frustration				sadness
Data	CNN14	w2v2-b	hubert-b	axlstm	CNN14	w2v2-b	hubert-b	axlstm	CNN14	w2v2-b	hubert-b	axlstm	CNN14	w2v2-b	hubert-b	axlstm	CNN14	w2v2-b	hubert-b	axlstm
crema-d-1.2.0-emotion.categories.test.gold_standard	0.84	0.93	0.90	0.55	0.66	0.88	0.80	0.42	0.63	0.89	0.84	0.59					0.56	0.93	0.82	0.72
danish-emotional-speech-1.1.1-emotion.test	0.83	0.50	0.79	0.69													0.81	0.75	0.77	0.71
emodb-1.2.0-emotion.categories.test.gold_standard	0.98	0.84	0.62	0.44	0.54	0.69	0.65	0.46	0.85	0.64	0.76	0.42					0.70	1.00	0.67	0.74
emovo-1.2.1-emotion.test	0.93	0.85	0.79	0.56	0.65	0.55	0.77	0.52	0.61	0.62	0.83	0.58					0.60	0.74	0.83	0.62
iemocap-2.3.0-emotion.categories.test.gold_standard	0.75	0.70	0.67	0.42					0.65	0.65	0.59	0.47	0.56	0.58	0.64	0.39	0.54	0.79	0.86	0.38
meld-1.3.1-emotion.categories.test.gold_standard	0.52	0.42	0.31	0.43	0.45	0.30	0.30	0.31	0.30	0.26	0.14	0.30					0.44	0.41	0.39	0.43
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.92	0.68	0.40	0.48					0.62	0.45	0.35	0.48					0.10	0.82	0.40	0.75
ravdess-1.1.2-emotion.speech.test	0.97	1.00	1.00	0.72	0.91	1.00	1.00	0.44	0.91	0.94	1.00	0.53					0.75	0.97	1.00	0.47
mean	0.84	0.74	0.69	0.54	0.64	0.68	0.70	0.43	0.65	0.64	0.64	0.48	0.56	0.58	0.64	0.39	0.56	0.80	0.72	0.60

Samples In Expected Neutral Range¶

Proportion of samples whose predictions fall into the expected value range of [0.3, 0.6]

Threshold: 0.75¶
Data	boredom				neutral
Data	CNN14	w2v2-b	hubert-b	axlstm	CNN14	w2v2-b	hubert-b	axlstm
crema-d-1.2.0-emotion.categories.test.gold_standard					0.97	0.88	0.81	0.94
danish-emotional-speech-1.1.1-emotion.test					0.96	0.96	0.92	0.90
emodb-1.2.0-emotion.categories.test.gold_standard	0.97	0.97	0.97	0.86	0.96	0.96	0.93	0.96
emovo-1.2.1-emotion.test					0.90	0.96	0.85	0.89
iemocap-2.3.0-emotion.categories.test.gold_standard					0.93	0.86	0.80	0.93
meld-1.3.1-emotion.categories.test.gold_standard					0.70	0.62	0.52	0.85
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.90	1.00	0.98	0.95	0.92	0.95	0.85	0.88
ravdess-1.1.2-emotion.speech.test					1.00	1.00	0.06	1.00
mean	0.94	0.98	0.97	0.91	0.92	0.90	0.72	0.92

Visualization¶

Distribution of dimensional model predictions for samples with different categorical emotions. The expected range of model predictions is highlighted by the green brackground.

CNN14	w2v2-b	hubert-b	axlstm