CNN14-cat vs. w2v2-b-cat vs. hubert-b-cat vs. axlstm-cat

This compares the models CNN14-cat and w2v2-b-cat and hubert-b-cat and axlstm-cat to one another.

Tests overview

Topic

Passed Tests

CNN14-cat

w2v2-b-cat

hubert-b-cat

axlstm-cat

Overall Score

74.8% (460 passed / 155 failed)

75.3% (463 passed / 152 failed)

73.0% (449 passed / 166 failed)

71.4% (439 passed / 176 failed)

Correctness classification

38.0%

49.0%

52.0%

40.0%

Correctness distribution

52.5%

70.0%

52.5%

67.5%

Correctness speaker average

41.7%

41.7%

33.3%

58.3%

Correctness speaker ranking

37.5%

75.0%

50.0%

25.0%

Fairness accent

96.0%

100.0%

98.4%

100.0%

Fairness language

100.0%

83.3%

66.7%

100.0%

Fairness linguistic sentiment

97.9%

90.6%

90.6%

100.0%

Fairness pitch

100.0%

100.0%

100.0%

96.3%

Fairness sex

100.0%

100.0%

100.0%

97.2%

Robustness background noise

56.7%

38.3%

30.0%

35.0%

Robustness low quality phone

90.0%

100.0%

60.0%

80.0%

Robustness recording condition

0.0%

0.0%

50.0%

0.0%

Robustness simulated recording condition

0.0%

0.0%

33.3%

0.0%

Robustness small changes

74.0%

60.0%

76.0%

32.0%

Robustness spectral tilt

65.0%

90.0%

75.0%

65.0%