Robustness small changes¶

Overall scores¶
	w2v2-b-cat	w2v2-L-cat	w2v2-L-robust-cat	w2v2-L-vox-cat	w2v2-L-xls-r-cat
Overall Score	60.0% passed tests (30 passed / 20 failed).	66.0% passed tests (33 passed / 17 failed).	72.0% passed tests (36 passed / 14 failed).	80.0% passed tests (40 passed / 10 failed).	70.0% passed tests (35 passed / 15 failed).

Percentage Unchanged Predictions Additive Tone¶

Threshold: 0.95¶
Data	Percent Unchanged Pred Additive Tone
Data	w2v2-b-cat	w2v2-L-cat	w2v2-L-robust-cat	w2v2-L-vox-cat	w2v2-L-xls-r-cat
crema-d-1.2.0-emotion.categories.test.gold_standard	0.96	0.96	0.96	0.99	0.99
emovo-1.2.1-emotion.test	0.95	0.93	0.94	0.95	0.94
iemocap-2.3.0-emotion.categories.test.gold_standard	0.93	0.92	0.96	0.95	0.93
meld-1.3.1-emotion.categories.test.gold_standard	0.98	0.97	0.98	0.99	0.98
msppodcast-2.6.0-emotion.categories.test-1.gold_standard	0.95	0.95	0.97	0.96	0.94
mean	0.95	0.95	0.96	0.97	0.96

Threshold: 0.95¶
Data	Percent Unchanged Pred Append Zeros
Data	w2v2-b-cat	w2v2-L-cat	w2v2-L-robust-cat	w2v2-L-vox-cat	w2v2-L-xls-r-cat
crema-d-1.2.0-emotion.categories.test.gold_standard	0.98	0.99	0.98	1.00	1.00
emovo-1.2.1-emotion.test	0.98	0.98	0.99	0.99	1.00
iemocap-2.3.0-emotion.categories.test.gold_standard	0.98	0.98	0.98	0.99	1.00
meld-1.3.1-emotion.categories.test.gold_standard	0.98	0.98	0.98	0.99	0.99
msppodcast-2.6.0-emotion.categories.test-1.gold_standard	0.99	0.99	0.99	0.99	0.99
mean	0.98	0.98	0.98	0.99	1.00

Threshold: 0.95¶
Data	Percent Unchanged Pred Clip
Data	w2v2-b-cat	w2v2-L-cat	w2v2-L-robust-cat	w2v2-L-vox-cat	w2v2-L-xls-r-cat
crema-d-1.2.0-emotion.categories.test.gold_standard	0.97	0.98	0.98	0.99	0.99
emovo-1.2.1-emotion.test	0.98	0.99	0.99	0.99	0.98
iemocap-2.3.0-emotion.categories.test.gold_standard	0.96	0.98	0.98	0.98	0.97
meld-1.3.1-emotion.categories.test.gold_standard	0.98	0.98	0.98	0.99	0.97
msppodcast-2.6.0-emotion.categories.test-1.gold_standard	0.98	0.99	0.99	0.99	0.98
mean	0.97	0.98	0.98	0.99	0.98

Threshold: 0.95¶
Data	Percent Unchanged Pred Crop Beginning
Data	w2v2-b-cat	w2v2-L-cat	w2v2-L-robust-cat	w2v2-L-vox-cat	w2v2-L-xls-r-cat
crema-d-1.2.0-emotion.categories.test.gold_standard	0.90	0.92	0.93	0.99	0.98
emovo-1.2.1-emotion.test	0.90	0.91	0.95	0.97	0.94
iemocap-2.3.0-emotion.categories.test.gold_standard	0.92	0.93	0.95	0.96	0.93
meld-1.3.1-emotion.categories.test.gold_standard	0.90	0.90	0.92	0.94	0.90
msppodcast-2.6.0-emotion.categories.test-1.gold_standard	0.94	0.96	0.96	0.96	0.94
mean	0.91	0.92	0.94	0.96	0.94

Threshold: 0.95¶
Data	Percent Unchanged Pred Crop End
Data	w2v2-b-cat	w2v2-L-cat	w2v2-L-robust-cat	w2v2-L-vox-cat	w2v2-L-xls-r-cat
crema-d-1.2.0-emotion.categories.test.gold_standard	0.99	0.99	0.99	1.00	1.00
emovo-1.2.1-emotion.test	0.98	0.98	0.99	1.00	0.99
iemocap-2.3.0-emotion.categories.test.gold_standard	0.98	0.98	0.99	0.99	0.99
meld-1.3.1-emotion.categories.test.gold_standard	0.97	0.97	0.97	0.98	0.98
msppodcast-2.6.0-emotion.categories.test-1.gold_standard	0.99	0.98	0.99	0.99	0.99
mean	0.98	0.98	0.99	0.99	0.99

Threshold: 0.95¶
Data	Percent Unchanged Pred Gain
Data	w2v2-b-cat	w2v2-L-cat	w2v2-L-robust-cat	w2v2-L-vox-cat	w2v2-L-xls-r-cat
crema-d-1.2.0-emotion.categories.test.gold_standard	1.00	1.00	1.00	1.00	1.00
emovo-1.2.1-emotion.test	1.00	1.00	1.00	1.00	1.00
iemocap-2.3.0-emotion.categories.test.gold_standard	0.99	0.99	0.99	0.99	0.99
meld-1.3.1-emotion.categories.test.gold_standard	1.00	1.00	1.00	1.00	1.00
msppodcast-2.6.0-emotion.categories.test-1.gold_standard	1.00	1.00	1.00	1.00	1.00
mean	1.00	1.00	1.00	1.00	1.00

Threshold: 0.95¶
Data	Percent Unchanged Pred Highpass Filter
Data	w2v2-b-cat	w2v2-L-cat	w2v2-L-robust-cat	w2v2-L-vox-cat	w2v2-L-xls-r-cat
crema-d-1.2.0-emotion.categories.test.gold_standard	0.95	0.97	0.96	0.99	0.99
emovo-1.2.1-emotion.test	0.96	0.94	0.97	0.93	0.97
iemocap-2.3.0-emotion.categories.test.gold_standard	0.95	0.96	0.97	0.95	0.96
meld-1.3.1-emotion.categories.test.gold_standard	0.96	0.96	0.97	0.96	0.96
msppodcast-2.6.0-emotion.categories.test-1.gold_standard	0.97	0.98	0.98	0.95	0.96
mean	0.96	0.96	0.97	0.96	0.97

Threshold: 0.95¶
Data	Percent Unchanged Pred Lowpass Filter
Data	w2v2-b-cat	w2v2-L-cat	w2v2-L-robust-cat	w2v2-L-vox-cat	w2v2-L-xls-r-cat
crema-d-1.2.0-emotion.categories.test.gold_standard	0.95	0.97	0.96	0.98	0.96
emovo-1.2.1-emotion.test	0.98	0.97	0.99	0.99	0.98
iemocap-2.3.0-emotion.categories.test.gold_standard	0.98	0.99	0.99	0.99	0.99
meld-1.3.1-emotion.categories.test.gold_standard	0.97	0.97	0.98	0.98	0.97
msppodcast-2.6.0-emotion.categories.test-1.gold_standard	0.98	0.98	0.98	0.98	0.96
mean	0.97	0.98	0.98	0.98	0.97

Threshold: 0.95¶
Data	Percent Unchanged Pred Prepend Zeros
Data	w2v2-b-cat	w2v2-L-cat	w2v2-L-robust-cat	w2v2-L-vox-cat	w2v2-L-xls-r-cat
crema-d-1.2.0-emotion.categories.test.gold_standard	0.90	0.93	0.90	0.98	0.97
emovo-1.2.1-emotion.test	0.90	0.89	0.92	0.97	0.94
iemocap-2.3.0-emotion.categories.test.gold_standard	0.91	0.93	0.93	0.96	0.93
meld-1.3.1-emotion.categories.test.gold_standard	0.89	0.90	0.90	0.95	0.90
msppodcast-2.6.0-emotion.categories.test-1.gold_standard	0.94	0.95	0.95	0.96	0.94
mean	0.91	0.92	0.92	0.96	0.94

Threshold: 0.95¶
Data	Percent Unchanged Pred White Noise
Data	w2v2-b-cat	w2v2-L-cat	w2v2-L-robust-cat	w2v2-L-vox-cat	w2v2-L-xls-r-cat
crema-d-1.2.0-emotion.categories.test.gold_standard	0.94	0.94	0.93	0.97	0.98
emovo-1.2.1-emotion.test	0.90	0.87	0.89	0.85	0.88
iemocap-2.3.0-emotion.categories.test.gold_standard	0.88	0.87	0.92	0.90	0.95
meld-1.3.1-emotion.categories.test.gold_standard	0.96	0.97	0.96	0.97	0.95
msppodcast-2.6.0-emotion.categories.test-1.gold_standard	0.92	0.92	0.92	0.90	0.90
mean	0.92	0.91	0.92	0.92	0.93