w2v2-b-cat vs. w2v2-L-cat vs. w2v2-L-robust-cat vs. w2v2-L-vox-cat vs. w2v2-L-xls-r-cat

This compares the models w2v2-b-cat and w2v2-L-cat and w2v2-L-robust-cat and w2v2-L-vox-cat and w2v2-L-xls-r-cat to one another.

Tests overview

Topic

Passed Tests

w2v2-b-cat

w2v2-L-cat

w2v2-L-robust-cat

w2v2-L-vox-cat

w2v2-L-xls-r-cat

Overall Score

75.3% (463 passed / 152 failed)

74.5% (458 passed / 157 failed)

78.9% (485 passed / 130 failed)

76.3% (469 passed / 146 failed)

76.7% (472 passed / 143 failed)

Correctness classification

49.0%

52.0%

66.0%

50.0%

48.0%

Correctness distribution

70.0%

70.0%

65.0%

45.0%

47.5%

Correctness speaker average

41.7%

33.3%

50.0%

33.3%

25.0%

Correctness speaker ranking

75.0%

50.0%

50.0%

37.5%

37.5%

Fairness accent

100.0%

100.0%

100.0%

95.2%

99.2%

Fairness language

83.3%

75.0%

87.5%

87.5%

91.7%

Fairness linguistic sentiment

90.6%

88.5%

88.5%

97.9%

96.9%

Fairness pitch

100.0%

100.0%

100.0%

96.3%

100.0%

Fairness sex

100.0%

94.4%

100.0%

100.0%

100.0%

Robustness background noise

38.3%

33.3%

40.0%

43.3%

55.0%

Robustness low quality phone

100.0%

90.0%

80.0%

90.0%

100.0%

Robustness recording condition

0.0%

0.0%

100.0%

100.0%

0.0%

Robustness simulated recording condition

0.0%

33.3%

66.7%

83.3%

100.0%

Robustness small changes

60.0%

66.0%

72.0%

80.0%

70.0%

Robustness spectral tilt

90.0%

90.0%

80.0%

85.0%

70.0%