w2v2-b vs. w2v2-L vs. w2v2-L-robust vs. w2v2-L-xls-r vs. w2v2-L-vox

This compares the models w2v2-b and w2v2-L and w2v2-L-robust and w2v2-L-xls-r and w2v2-L-vox to one another.

Tests overview

Topic

Passed Tests

w2v2-b

w2v2-L

w2v2-L-robust

w2v2-L-xls-r

w2v2-L-vox

Overall Score

79.2% (350 passed / 92 failed)

81.0% (295 passed / 69 failed)

80.9% (326 passed / 77 failed)

80.4% (324 passed / 79 failed)

83.7% (318 passed / 62 failed)

Correctness consistency

46.8%

59.6%

42.6%

59.6%

68.1%

Correctness distribution

33.3%

33.3%

66.7%

33.3%

33.3%

Correctness regression

0.0%

0.0%

22.2%

0.0%

0.0%

Correctness speaker average

100.0%

100.0%

100.0%

100.0%

100.0%

Correctness speaker ranking

0.0%

0.0%

50.0%

0.0%

0.0%

Fairness accent

98.7%

96.8%

100.0%

92.7%

95.7%

Fairness language

75.0%

100.0%

75.0%

100.0%

100.0%

Fairness linguistic sentiment

89.6%

100.0%

89.8%

97.7%

100.0%

Fairness pitch

100.0%

100.0%

100.0%

86.7%

93.3%

Fairness sex

100.0%

100.0%

100.0%

100.0%

100.0%

Robustness background noise

29.2%

16.7%

41.7%

37.5%

41.7%

Robustness low quality phone

0.0%

75.0%

50.0%

75.0%

100.0%

Robustness recording condition

0.0%

0.0%

50.0%

0.0%

0.0%

Robustness simulated recording condition

0.0%

16.7%

0.0%

0.0%

0.0%

Robustness small changes

75.0%

80.0%

90.0%

60.0%

80.0%

Robustness spectral tilt

75.0%

75.0%

87.5%

75.0%

62.5%