w2v2-L vs. hubert-L vs. wavlm vs. data2vec¶

This compares the models w2v2-L and hubert-L and wavlm and data2vec to one another.

Tests overview¶
Topic	Passed Tests
Topic	w2v2-L	hubert-L	wavlm	data2vec
Overall Score	87.9% (340 passed / 47 failed)	86.4% (329 passed / 52 failed)	88.3% (363 passed / 48 failed)	84.5% (327 passed / 60 failed)
Correctness consistency	67.6%	73.0%	59.5%	64.9%
Correctness distribution	66.7%	66.7%	33.3%	66.7%
Correctness regression	88.9%	88.9%	66.7%	66.7%
Correctness speaker average	100.0%	100.0%	66.7%	100.0%
Correctness speaker ranking	100.0%	100.0%	100.0%	100.0%
Fairness accent	100.0%	100.0%	97.6%	100.0%
Fairness language	86.7%	83.3%	80.0%	83.3%
Fairness linguistic sentiment	100.0%	88.9%	99.0%	87.5%
Fairness pitch	100.0%	100.0%	93.3%	100.0%
Fairness sex	100.0%	82.1%	100.0%	100.0%
Robustness background noise	37.5%	41.7%	62.5%	37.5%
Robustness low quality phone	25.0%	100.0%	100.0%	50.0%
Robustness recording condition	0.0%	50.0%	50.0%	50.0%
Robustness simulated recording condition	33.3%	16.7%	33.3%	16.7%
Robustness small changes	90.0%	90.0%	100.0%	85.0%
Robustness spectral tilt	62.5%	87.5%	75.0%	62.5%