CNN14-cat vs. w2v2-b-cat vs. hubert-b-cat vs. axlstm-cat¶

This compares the models CNN14-cat and w2v2-b-cat and hubert-b-cat and axlstm-cat to one another.

Tests overview¶
Topic	Passed Tests
Topic	CNN14-cat	w2v2-b-cat	hubert-b-cat	axlstm-cat
Overall Score	74.8% (460 passed / 155 failed)	75.3% (463 passed / 152 failed)	73.0% (449 passed / 166 failed)	71.4% (439 passed / 176 failed)
Correctness classification	38.0%	49.0%	52.0%	40.0%
Correctness distribution	52.5%	70.0%	52.5%	67.5%
Correctness speaker average	41.7%	41.7%	33.3%	58.3%
Correctness speaker ranking	37.5%	75.0%	50.0%	25.0%
Fairness accent	96.0%	100.0%	98.4%	100.0%
Fairness language	100.0%	83.3%	66.7%	100.0%
Fairness linguistic sentiment	97.9%	90.6%	90.6%	100.0%
Fairness pitch	100.0%	100.0%	100.0%	96.3%
Fairness sex	100.0%	100.0%	100.0%	97.2%
Robustness background noise	56.7%	38.3%	30.0%	35.0%
Robustness low quality phone	90.0%	100.0%	60.0%	80.0%
Robustness recording condition	0.0%	0.0%	50.0%	0.0%
Robustness simulated recording condition	0.0%	0.0%	33.3%	0.0%
Robustness small changes	74.0%	60.0%	76.0%	32.0%
Robustness spectral tilt	65.0%	90.0%	75.0%	65.0%