w2v2-b-cat vs. w2v2-L-cat vs. w2v2-L-robust-cat vs. w2v2-L-vox-cat vs. w2v2-L-xls-r-cat¶

Tests overview¶
Topic	Passed Tests
Topic	w2v2-b-cat	w2v2-L-cat	w2v2-L-robust-cat	w2v2-L-vox-cat	w2v2-L-xls-r-cat
Overall Score	75.3% (463 passed / 152 failed)	74.5% (458 passed / 157 failed)	78.9% (485 passed / 130 failed)	76.3% (469 passed / 146 failed)	76.7% (472 passed / 143 failed)
Correctness classification	49.0%	52.0%	66.0%	50.0%	48.0%
Correctness distribution	70.0%	70.0%	65.0%	45.0%	47.5%
Correctness speaker average	41.7%	33.3%	50.0%	33.3%	25.0%
Correctness speaker ranking	75.0%	50.0%	50.0%	37.5%	37.5%
Fairness accent	100.0%	100.0%	100.0%	95.2%	99.2%
Fairness language	83.3%	75.0%	87.5%	87.5%	91.7%
Fairness linguistic sentiment	90.6%	88.5%	88.5%	97.9%	96.9%
Fairness pitch	100.0%	100.0%	100.0%	96.3%	100.0%
Fairness sex	100.0%	94.4%	100.0%	100.0%	100.0%
Robustness background noise	38.3%	33.3%	40.0%	43.3%	55.0%
Robustness low quality phone	100.0%	90.0%	80.0%	90.0%	100.0%
Robustness recording condition	0.0%	0.0%	100.0%	100.0%	0.0%
Robustness simulated recording condition	0.0%	33.3%	66.7%	83.3%	100.0%
Robustness small changes	60.0%	66.0%	72.0%	80.0%	70.0%
Robustness spectral tilt	90.0%	90.0%	80.0%	85.0%	70.0%