w2v2-b vs. w2v2-L vs. w2v2-L-robust vs. w2v2-L-xls-r vs. w2v2-L-vox

This compares the models w2v2-b and w2v2-L and w2v2-L-robust and w2v2-L-xls-r and w2v2-L-vox to one another.

Tests overview

Topic

Passed Tests

w2v2-b

w2v2-L

w2v2-L-robust

w2v2-L-xls-r

w2v2-L-vox

Overall Score

84.9% (349 passed / 62 failed)

87.9% (340 passed / 47 failed)

87.4% (333 passed / 48 failed)

86.0% (333 passed / 54 failed)

88.3% (363 passed / 48 failed)

Correctness consistency

56.8%

67.6%

64.9%

67.6%

62.2%

Correctness distribution

33.3%

66.7%

66.7%

66.7%

33.3%

Correctness regression

66.7%

88.9%

88.9%

66.7%

66.7%

Correctness speaker average

100.0%

100.0%

100.0%

100.0%

100.0%

Correctness speaker ranking

100.0%

100.0%

100.0%

100.0%

100.0%

Fairness accent

96.8%

100.0%

99.2%

99.2%

98.4%

Fairness language

83.3%

86.7%

83.3%

83.3%

80.0%

Fairness linguistic sentiment

94.8%

100.0%

86.1%

100.0%

100.0%

Fairness pitch

93.3%

100.0%

100.0%

100.0%

100.0%

Fairness sex

100.0%

100.0%

85.7%

92.9%

100.0%

Robustness background noise

41.7%

37.5%

70.8%

41.7%

50.0%

Robustness low quality phone

100.0%

25.0%

100.0%

0.0%

75.0%

Robustness recording condition

0.0%

0.0%

50.0%

0.0%

100.0%

Robustness simulated recording condition

0.0%

33.3%

33.3%

0.0%

33.3%

Robustness small changes

90.0%

90.0%

95.0%

90.0%

90.0%

Robustness spectral tilt

75.0%

62.5%

87.5%

75.0%

75.0%