w2v2-b vs. w2v2-L vs. w2v2-L-robust vs. w2v2-L-xls-r vs. w2v2-L-vox¶

This compares the models w2v2-b and w2v2-L and w2v2-L-robust and w2v2-L-xls-r and w2v2-L-vox to one another.

Tests overview¶
Topic	Passed Tests
Topic	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
Overall Score	79.2% (350 passed / 92 failed)	81.0% (295 passed / 69 failed)	80.9% (326 passed / 77 failed)	80.4% (324 passed / 79 failed)	83.7% (318 passed / 62 failed)
Correctness consistency	46.8%	59.6%	42.6%	59.6%	68.1%
Correctness distribution	33.3%	33.3%	66.7%	33.3%	33.3%
Correctness regression	0.0%	0.0%	22.2%	0.0%	0.0%
Correctness speaker average	100.0%	100.0%	100.0%	100.0%	100.0%
Correctness speaker ranking	0.0%	0.0%	50.0%	0.0%	0.0%
Fairness accent	98.7%	96.8%	100.0%	92.7%	95.7%
Fairness language	75.0%	100.0%	75.0%	100.0%	100.0%
Fairness linguistic sentiment	89.6%	100.0%	89.8%	97.7%	100.0%
Fairness pitch	100.0%	100.0%	100.0%	86.7%	93.3%
Fairness sex	100.0%	100.0%	100.0%	100.0%	100.0%
Robustness background noise	29.2%	16.7%	41.7%	37.5%	41.7%
Robustness low quality phone	0.0%	75.0%	50.0%	75.0%	100.0%
Robustness recording condition	0.0%	0.0%	50.0%	0.0%	0.0%
Robustness simulated recording condition	0.0%	16.7%	0.0%	0.0%	0.0%
Robustness small changes	75.0%	80.0%	90.0%	60.0%	80.0%
Robustness spectral tilt	75.0%	75.0%	87.5%	75.0%	62.5%