CNN14 vs. w2v2-b vs. hubert-b vs. axlstm¶

This compares the models CNN14 and w2v2-b and hubert-b and axlstm to one another.

Tests overview¶
Topic	Passed Tests
Topic	CNN14	w2v2-b	hubert-b	axlstm
Overall Score	79.2% (350 passed / 92 failed)	79.2% (350 passed / 92 failed)	81.8% (381 passed / 85 failed)	79.4% (351 passed / 91 failed)
Correctness consistency	40.4%	46.8%	53.2%	21.3%
Correctness distribution	33.3%	33.3%	33.3%	0.0%
Correctness regression	0.0%	0.0%	22.2%	0.0%
Correctness speaker average	100.0%	100.0%	100.0%	100.0%
Correctness speaker ranking	0.0%	0.0%	50.0%	0.0%
Fairness accent	98.1%	98.7%	100.0%	100.0%
Fairness language	100.0%	75.0%	100.0%	100.0%
Fairness linguistic sentiment	97.9%	89.6%	85.8%	100.0%
Fairness pitch	80.0%	100.0%	100.0%	100.0%
Fairness sex	83.3%	100.0%	100.0%	100.0%
Robustness background noise	41.7%	29.2%	33.3%	45.8%
Robustness low quality phone	0.0%	0.0%	0.0%	25.0%
Robustness recording condition	0.0%	0.0%	0.0%	0.0%
Robustness simulated recording condition	0.0%	0.0%	0.0%	0.0%
Robustness small changes	70.0%	75.0%	60.0%	35.0%
Robustness spectral tilt	12.5%	75.0%	100.0%	62.5%