w2v2-L vs. hubert-L vs. wavlm vs. data2vec¶

This compares the models w2v2-L and hubert-L and wavlm and data2vec to one another.

Tests overview¶
Topic	Passed Tests
Topic	w2v2-L	hubert-L	wavlm	data2vec
Overall Score	81.0% (295 passed / 69 failed)	78.8% (287 passed / 77 failed)	83.4% (336 passed / 67 failed)	76.9% (299 passed / 90 failed)
Correctness consistency	59.6%	36.2%	55.3%	29.8%
Correctness distribution	33.3%	66.7%	33.3%	66.7%
Correctness regression	0.0%	22.2%	44.4%	33.3%
Correctness speaker average	100.0%	100.0%	100.0%	100.0%
Correctness speaker ranking	0.0%	50.0%	50.0%	50.0%
Fairness accent	96.8%	100.0%	100.0%	100.0%
Fairness language	100.0%	83.3%	83.3%	61.1%
Fairness linguistic sentiment	100.0%	88.8%	89.8%	78.8%
Fairness pitch	100.0%	100.0%	100.0%	100.0%
Fairness sex	100.0%	100.0%	100.0%	100.0%
Robustness background noise	16.7%	37.5%	33.3%	37.5%
Robustness low quality phone	75.0%	50.0%	75.0%	50.0%
Robustness recording condition	0.0%	50.0%	50.0%	100.0%
Robustness simulated recording condition	16.7%	0.0%	33.3%	16.7%
Robustness small changes	80.0%	95.0%	85.0%	85.0%
Robustness spectral tilt	75.0%	100.0%	100.0%	100.0%