CNN14 vs. w2v2-b vs. hubert-b vs. axlstm

This compares the models CNN14 and w2v2-b and hubert-b and axlstm to one another.

Tests overview

Topic

Passed Tests

CNN14

w2v2-b

hubert-b

axlstm

Overall Score

83.2% (376 passed / 76 failed)

84.9% (332 passed / 59 failed)

85.4% (381 passed / 65 failed)

82.5% (373 passed / 79 failed)

Correctness consistency

46.5%

51.2%

46.5%

51.2%

Correctness distribution

33.3%

66.7%

66.7%

66.7%

Correctness regression

44.4%

66.7%

66.7%

44.4%

Correctness speaker average

100.0%

100.0%

100.0%

100.0%

Correctness speaker ranking

50.0%

100.0%

100.0%

100.0%

Fairness accent

98.7%

98.4%

98.7%

100.0%

Fairness language

93.3%

79.2%

91.7%

93.3%

Fairness linguistic sentiment

99.0%

100.0%

94.8%

97.9%

Fairness pitch

86.7%

100.0%

100.0%

100.0%

Fairness sex

87.5%

90.6%

100.0%

90.6%

Robustness background noise

45.8%

50.0%

41.7%

29.2%

Robustness low quality phone

25.0%

75.0%

25.0%

0.0%

Robustness recording condition

0.0%

0.0%

50.0%

0.0%

Robustness simulated recording condition

0.0%

0.0%

0.0%

0.0%

Robustness small changes

80.0%

90.0%

80.0%

40.0%

Robustness spectral tilt

25.0%

87.5%

87.5%

50.0%