CNN14 vs. w2v2-b vs. hubert-b vs. axlstm

This compares the models CNN14 and w2v2-b and hubert-b and axlstm to one another.

Tests overview

Topic

Passed Tests

CNN14

w2v2-b

hubert-b

axlstm

Overall Score

79.2% (350 passed / 92 failed)

79.2% (350 passed / 92 failed)

81.8% (381 passed / 85 failed)

79.4% (351 passed / 91 failed)

Correctness consistency

40.4%

46.8%

53.2%

21.3%

Correctness distribution

33.3%

33.3%

33.3%

0.0%

Correctness regression

0.0%

0.0%

22.2%

0.0%

Correctness speaker average

100.0%

100.0%

100.0%

100.0%

Correctness speaker ranking

0.0%

0.0%

50.0%

0.0%

Fairness accent

98.1%

98.7%

100.0%

100.0%

Fairness language

100.0%

75.0%

100.0%

100.0%

Fairness linguistic sentiment

97.9%

89.6%

85.8%

100.0%

Fairness pitch

80.0%

100.0%

100.0%

100.0%

Fairness sex

83.3%

100.0%

100.0%

100.0%

Robustness background noise

41.7%

29.2%

33.3%

45.8%

Robustness low quality phone

0.0%

0.0%

0.0%

25.0%

Robustness recording condition

0.0%

0.0%

0.0%

0.0%

Robustness simulated recording condition

0.0%

0.0%

0.0%

0.0%

Robustness small changes

70.0%

75.0%

60.0%

35.0%

Robustness spectral tilt

12.5%

75.0%

100.0%

62.5%