w2v2-b vs. w2v2-L vs. w2v2-L-robust vs. w2v2-L-xls-r vs. w2v2-L-vox¶

This compares the models w2v2-b and w2v2-L and w2v2-L-robust and w2v2-L-xls-r and w2v2-L-vox to one another.

Tests overview¶
Topic	Passed Tests
Topic	w2v2-b	w2v2-L	w2v2-L-robust	w2v2-L-xls-r	w2v2-L-vox
Overall Score	84.9% (349 passed / 62 failed)	87.9% (340 passed / 47 failed)	87.4% (333 passed / 48 failed)	86.0% (333 passed / 54 failed)	88.3% (363 passed / 48 failed)
Correctness consistency	56.8%	67.6%	64.9%	67.6%	62.2%
Correctness distribution	33.3%	66.7%	66.7%	66.7%	33.3%
Correctness regression	66.7%	88.9%	88.9%	66.7%	66.7%
Correctness speaker average	100.0%	100.0%	100.0%	100.0%	100.0%
Correctness speaker ranking	100.0%	100.0%	100.0%	100.0%	100.0%
Fairness accent	96.8%	100.0%	99.2%	99.2%	98.4%
Fairness language	83.3%	86.7%	83.3%	83.3%	80.0%
Fairness linguistic sentiment	94.8%	100.0%	86.1%	100.0%	100.0%
Fairness pitch	93.3%	100.0%	100.0%	100.0%	100.0%
Fairness sex	100.0%	100.0%	85.7%	92.9%	100.0%
Robustness background noise	41.7%	37.5%	70.8%	41.7%	50.0%
Robustness low quality phone	100.0%	25.0%	100.0%	0.0%	75.0%
Robustness recording condition	0.0%	0.0%	50.0%	0.0%	100.0%
Robustness simulated recording condition	0.0%	33.3%	33.3%	0.0%	33.3%
Robustness small changes	90.0%	90.0%	95.0%	90.0%	90.0%
Robustness spectral tilt	75.0%	62.5%	87.5%	75.0%	75.0%