CNN14 vs. w2v2-b vs. hubert-b vs. axlstm¶

This compares the models CNN14 and w2v2-b and hubert-b and axlstm to one another.

Tests overview¶
Topic	Passed Tests
Topic	CNN14	w2v2-b	hubert-b	axlstm
Overall Score	80.5% (356 passed / 86 failed)	84.9% (349 passed / 62 failed)	87.1% (385 passed / 57 failed)	83.7% (370 passed / 72 failed)
Correctness consistency	48.6%	56.8%	64.9%	54.1%
Correctness distribution	66.7%	33.3%	33.3%	66.7%
Correctness regression	22.2%	66.7%	55.6%	44.4%
Correctness speaker average	100.0%	100.0%	100.0%	100.0%
Correctness speaker ranking	100.0%	100.0%	100.0%	100.0%
Fairness accent	92.3%	96.8%	99.4%	99.4%
Fairness language	86.7%	83.3%	86.7%	90.0%
Fairness linguistic sentiment	96.9%	94.8%	97.9%	100.0%
Fairness pitch	100.0%	93.3%	100.0%	100.0%
Fairness sex	100.0%	100.0%	100.0%	100.0%
Robustness background noise	25.0%	41.7%	41.7%	29.2%
Robustness low quality phone	25.0%	100.0%	25.0%	0.0%
Robustness recording condition	0.0%	0.0%	0.0%	0.0%
Robustness simulated recording condition	0.0%	0.0%	0.0%	0.0%
Robustness small changes	75.0%	90.0%	80.0%	40.0%
Robustness spectral tilt	25.0%	75.0%	75.0%	50.0%