CNN14 vs. w2v2-b vs. hubert-b vs. axlstm¶

This compares the models CNN14 and w2v2-b and hubert-b and axlstm to one another.

Tests overview¶
Topic	Passed Tests
Topic	CNN14	w2v2-b	hubert-b	axlstm
Overall Score	83.2% (376 passed / 76 failed)	84.9% (332 passed / 59 failed)	85.4% (381 passed / 65 failed)	82.5% (373 passed / 79 failed)
Correctness consistency	46.5%	51.2%	46.5%	51.2%
Correctness distribution	33.3%	66.7%	66.7%	66.7%
Correctness regression	44.4%	66.7%	66.7%	44.4%
Correctness speaker average	100.0%	100.0%	100.0%	100.0%
Correctness speaker ranking	50.0%	100.0%	100.0%	100.0%
Fairness accent	98.7%	98.4%	98.7%	100.0%
Fairness language	93.3%	79.2%	91.7%	93.3%
Fairness linguistic sentiment	99.0%	100.0%	94.8%	97.9%
Fairness pitch	86.7%	100.0%	100.0%	100.0%
Fairness sex	87.5%	90.6%	100.0%	90.6%
Robustness background noise	45.8%	50.0%	41.7%	29.2%
Robustness low quality phone	25.0%	75.0%	25.0%	0.0%
Robustness recording condition	0.0%	0.0%	50.0%	0.0%
Robustness simulated recording condition	0.0%	0.0%	0.0%	0.0%
Robustness small changes	80.0%	90.0%	80.0%	40.0%
Robustness spectral tilt	25.0%	87.5%	87.5%	50.0%