w2v2-b vs. w2v2-L vs. w2v2-L-robust vs. w2v2-L-xls-r vs. w2v2-L-vox¶

This compares the models w2v2-b and w2v2-L and w2v2-L-robust and w2v2-L-xls-r and w2v2-L-vox to one another.

Tests overview¶
Topic	Passed Tests
Topic	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
Overall Score	84.9% (332 passed / 59 failed)	81.7% (294 passed / 66 failed)	84.4% (304 passed / 56 failed)	83.6% (301 passed / 59 failed)	85.1% (338 passed / 59 failed)
Correctness consistency	51.2%	55.8%	51.2%	53.5%	51.2%
Correctness distribution	66.7%	66.7%	66.7%	66.7%	66.7%
Correctness regression	66.7%	66.7%	66.7%	66.7%	66.7%
Correctness speaker average	100.0%	100.0%	100.0%	100.0%	100.0%
Correctness speaker ranking	100.0%	100.0%	100.0%	100.0%	100.0%
Fairness accent	98.4%	93.5%	89.2%	97.8%	94.4%
Fairness language	79.2%	66.7%	79.2%	79.2%	83.3%
Fairness linguistic sentiment	100.0%	97.2%	93.1%	100.0%	100.0%
Fairness pitch	100.0%	93.3%	100.0%	93.3%	100.0%
Fairness sex	90.6%	96.9%	100.0%	96.9%	93.8%
Robustness background noise	50.0%	50.0%	75.0%	45.8%	54.2%
Robustness low quality phone	75.0%	25.0%	100.0%	25.0%	100.0%
Robustness recording condition	0.0%	0.0%	50.0%	50.0%	50.0%
Robustness simulated recording condition	0.0%	16.7%	50.0%	0.0%	33.3%
Robustness small changes	90.0%	90.0%	95.0%	90.0%	90.0%
Robustness spectral tilt	87.5%	87.5%	100.0%	87.5%	75.0%