Correctness speaker ranking¶

Overall scores¶
	CNN14-cat	w2v2-b-cat	hubert-b-cat	axlstm-cat
Overall Score	37.5% passed tests (3 passed / 5 failed).	75.0% passed tests (6 passed / 2 failed).	50.0% passed tests (4 passed / 4 failed).	25.0% passed tests (2 passed / 6 failed).

Spearmans Rho¶

Threshold: 0.7¶
Data	anger				happiness				neutral				sadness
Data	CNN14-cat	w2v2-b-cat	hubert-b-cat	axlstm-cat	CNN14-cat	w2v2-b-cat	hubert-b-cat	axlstm-cat	CNN14-cat	w2v2-b-cat	hubert-b-cat	axlstm-cat	CNN14-cat	w2v2-b-cat	hubert-b-cat	axlstm-cat
meld-1.3.1-emotion.categories.test.gold_standard	0.26	0.89	0.83	0.83	-0.77	0.09	0.20	0.26	0.94	0.83	0.77	0.83	-0.14	0.71	0.54	0.26
msppodcast-2.6.0-emotion.categories.test-1.gold_standard	0.66	0.94	0.94	0.60	0.49	0.89	0.77	0.09	0.94	0.94	0.60	0.60	0.60	-0.26	-0.03	-0.43
mean	0.46	0.92	0.89	0.71	-0.14	0.49	0.48	0.17	0.94	0.89	0.69	0.71	0.23	0.22	0.26	-0.08

Visualization¶

The plots visualize the precision of predicting speakers to be in the Top 25% or Bottom 25% of all speakers for each class in respect to the proportion of samples of that class. Green dots indicate correctly classified speakers, red false positive speakers, whereby red squares indicate confusions between Top 25% and Bottom 25% speakers. The remaining grey data points are samples outside the range of interest. They contain false negatives that should have been predicted in the Top 25% or Bottom 25% of speakers, but were not. True negatives are those speakers that are not part of the Top 25% or Bottom 25%, and were predicted as such.

CNN14-cat	w2v2-b-cat	hubert-b-cat	axlstm-cat