w2v2-L vs. hubert-L vs. wavlm vs. data2vec¶

This compares the models w2v2-L and hubert-L and wavlm and data2vec to one another.

Tests overview¶
Topic	Passed Tests
Topic	w2v2-L	hubert-L	wavlm	data2vec
Overall Score	81.7% (294 passed / 66 failed)	84.7% (305 passed / 55 failed)	85.1% (338 passed / 59 failed)	82.1% (321 passed / 70 failed)
Correctness consistency	55.8%	51.2%	65.1%	48.8%
Correctness distribution	66.7%	66.7%	33.3%	66.7%
Correctness regression	66.7%	55.6%	55.6%	66.7%
Correctness speaker average	100.0%	100.0%	100.0%	100.0%
Correctness speaker ranking	100.0%	100.0%	100.0%	100.0%
Fairness accent	93.5%	100.0%	96.0%	96.8%
Fairness language	66.7%	87.5%	86.7%	83.3%
Fairness linguistic sentiment	97.2%	91.7%	84.7%	87.5%
Fairness pitch	93.3%	100.0%	93.3%	93.3%
Fairness sex	96.9%	96.9%	96.9%	90.6%
Robustness background noise	50.0%	54.2%	62.5%	54.2%
Robustness low quality phone	25.0%	75.0%	100.0%	75.0%
Robustness recording condition	0.0%	50.0%	0.0%	50.0%
Robustness simulated recording condition	16.7%	16.7%	50.0%	16.7%
Robustness small changes	90.0%	95.0%	100.0%	90.0%
Robustness spectral tilt	87.5%	100.0%	75.0%	62.5%