w2v2-L-cat vs. hubert-L-cat vs. wavlm-cat vs. data2vec-cat¶

This compares the models w2v2-L-cat and hubert-L-cat and wavlm-cat and data2vec-cat to one another.

Tests overview¶
Topic	Passed Tests
Topic	w2v2-L-cat	hubert-L-cat	wavlm-cat	data2vec-cat
Overall Score	74.5% (458 passed / 157 failed)	79.5% (489 passed / 126 failed)	78.5% (483 passed / 132 failed)	74.3% (457 passed / 158 failed)
Correctness classification	52.0%	64.0%	72.0%	51.0%
Correctness distribution	70.0%	62.5%	67.5%	70.0%
Correctness speaker average	33.3%	66.7%	58.3%	58.3%
Correctness speaker ranking	50.0%	62.5%	50.0%	37.5%
Fairness accent	100.0%	100.0%	99.2%	100.0%
Fairness language	75.0%	91.7%	79.2%	83.3%
Fairness linguistic sentiment	88.5%	85.4%	85.4%	86.5%
Fairness pitch	100.0%	96.3%	92.6%	96.3%
Fairness sex	94.4%	100.0%	94.4%	100.0%
Robustness background noise	33.3%	41.7%	46.7%	38.3%
Robustness low quality phone	90.0%	80.0%	90.0%	90.0%
Robustness recording condition	0.0%	100.0%	100.0%	50.0%
Robustness simulated recording condition	33.3%	66.7%	33.3%	33.3%
Robustness small changes	66.0%	78.0%	62.0%	56.0%
Robustness spectral tilt	90.0%	95.0%	90.0%	80.0%