Correctness consistency¶

Overall scores¶
	CNN14	w2v2-b	hubert-b	axlstm
Overall Score	48.6% passed tests (18 passed / 19 failed).	56.8% passed tests (21 passed / 16 failed).	64.9% passed tests (24 passed / 13 failed).	54.1% passed tests (20 passed / 17 failed).

Samples In Expected High Range¶

Proportion of samples whose predictions fall into the expected value range of >= 0.55

Threshold: 0.75¶
Data	anger				fear				surprise
Data	CNN14	w2v2-b	hubert-b	axlstm	CNN14	w2v2-b	hubert-b	axlstm	CNN14	w2v2-b	hubert-b	axlstm
crema-d-1.2.0-emotion.categories.test.gold_standard	0.74	0.75	0.79	0.70	0.42	0.40	0.40	0.43
danish-emotional-speech-1.1.1-emotion.test	0.33	0.48	0.33	0.38					0.25	0.54	0.27	0.19
emodb-1.2.0-emotion.categories.test.gold_standard	1.00	1.00	1.00	1.00	1.00	0.88	0.79	0.91
emovo-1.2.1-emotion.test	0.86	0.90	0.89	0.94	0.46	0.54	0.42	0.40	0.46	0.67	0.58	0.60
iemocap-2.3.0-emotion.categories.test.gold_standard	0.73	0.83	0.82	0.75	0.41	0.65	0.65	0.41
meld-1.3.1-emotion.categories.test.gold_standard	0.89	0.96	0.96	0.89	0.68	0.90	0.94	0.74	0.81	0.85	0.85	0.78
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.92	0.92	0.95	0.88	0.32	0.45	0.42	0.30
ravdess-1.1.2-emotion.speech.test	0.88	0.88	0.97	0.91	0.59	0.50	0.44	0.72	1.00	1.00	1.00	1.00
mean	0.79	0.84	0.84	0.81	0.55	0.62	0.58	0.56	0.63	0.77	0.68	0.64

Samples In Expected Low Range¶

Proportion of samples whose predictions fall into the expected value range of <= 0.45

Threshold: 0.75¶
Data	boredom				sadness
Data	CNN14	w2v2-b	hubert-b	axlstm	CNN14	w2v2-b	hubert-b	axlstm
crema-d-1.2.0-emotion.categories.test.gold_standard					0.87	0.89	0.87	0.51
danish-emotional-speech-1.1.1-emotion.test					1.00	1.00	1.00	0.88
emodb-1.2.0-emotion.categories.test.gold_standard	0.44	0.97	0.92	0.69	0.96	1.00	1.00	1.00
emovo-1.2.1-emotion.test					0.92	0.90	0.93	0.76
iemocap-2.3.0-emotion.categories.test.gold_standard					0.44	0.86	0.84	0.66
meld-1.3.1-emotion.categories.test.gold_standard					0.23	0.25	0.19	0.22
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.92	0.98	0.95	0.92	1.00	1.00	0.98	0.98
ravdess-1.1.2-emotion.speech.test					0.88	0.88	0.84	0.62
mean	0.68	0.97	0.94	0.80	0.79	0.85	0.83	0.70

Samples In Expected Neutral Range¶

Proportion of samples whose predictions fall into the expected value range of [0.3, 0.6]

Threshold: 0.75¶
Data	neutral
Data	CNN14	w2v2-b	hubert-b	axlstm
crema-d-1.2.0-emotion.categories.test.gold_standard	0.40	0.39	0.61	0.91
danish-emotional-speech-1.1.1-emotion.test	0.27	0.27	0.37	0.56
emodb-1.2.0-emotion.categories.test.gold_standard	1.00	0.52	0.78	0.93
emovo-1.2.1-emotion.test	0.68	0.81	0.82	0.85
iemocap-2.3.0-emotion.categories.test.gold_standard	0.86	0.69	0.78	0.93
meld-1.3.1-emotion.categories.test.gold_standard	0.73	0.48	0.45	0.76
polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard	0.78	0.80	0.92	0.82
ravdess-1.1.2-emotion.speech.test	0.50	0.00	0.06	0.88
mean	0.65	0.49	0.60	0.83

Visualization¶

Distribution of dimensional model predictions for samples with different categorical emotions. The expected range of model predictions is highlighted by the green brackground.

CNN14	w2v2-b	hubert-b	axlstm