CNN14 vs. w2v2-b vs. hubert-b vs. axlstm

This compares the models CNN14 and w2v2-b and hubert-b and axlstm to one another.

Tests overview

Topic

Passed Tests

CNN14

w2v2-b

hubert-b

axlstm

Overall Score

80.5% (356 passed / 86 failed)

84.9% (349 passed / 62 failed)

87.1% (385 passed / 57 failed)

83.7% (370 passed / 72 failed)

Correctness consistency

48.6%

56.8%

64.9%

54.1%

Correctness distribution

66.7%

33.3%

33.3%

66.7%

Correctness regression

22.2%

66.7%

55.6%

44.4%

Correctness speaker average

100.0%

100.0%

100.0%

100.0%

Correctness speaker ranking

100.0%

100.0%

100.0%

100.0%

Fairness accent

92.3%

96.8%

99.4%

99.4%

Fairness language

86.7%

83.3%

86.7%

90.0%

Fairness linguistic sentiment

96.9%

94.8%

97.9%

100.0%

Fairness pitch

100.0%

93.3%

100.0%

100.0%

Fairness sex

100.0%

100.0%

100.0%

100.0%

Robustness background noise

25.0%

41.7%

41.7%

29.2%

Robustness low quality phone

25.0%

100.0%

25.0%

0.0%

Robustness recording condition

0.0%

0.0%

0.0%

0.0%

Robustness simulated recording condition

0.0%

0.0%

0.0%

0.0%

Robustness small changes

75.0%

90.0%

80.0%

40.0%

Robustness spectral tilt

25.0%

75.0%

75.0%

50.0%