w2v2-L vs. hubert-L vs. wavlm vs. data2vec

This compares the models w2v2-L and hubert-L and wavlm and data2vec to one another.

Tests overview

Topic

Passed Tests

w2v2-L

hubert-L

wavlm

data2vec

Overall Score

87.9% (340 passed / 47 failed)

86.4% (329 passed / 52 failed)

88.3% (363 passed / 48 failed)

84.5% (327 passed / 60 failed)

Correctness consistency

67.6%

73.0%

59.5%

64.9%

Correctness distribution

66.7%

66.7%

33.3%

66.7%

Correctness regression

88.9%

88.9%

66.7%

66.7%

Correctness speaker average

100.0%

100.0%

66.7%

100.0%

Correctness speaker ranking

100.0%

100.0%

100.0%

100.0%

Fairness accent

100.0%

100.0%

97.6%

100.0%

Fairness language

86.7%

83.3%

80.0%

83.3%

Fairness linguistic sentiment

100.0%

88.9%

99.0%

87.5%

Fairness pitch

100.0%

100.0%

93.3%

100.0%

Fairness sex

100.0%

82.1%

100.0%

100.0%

Robustness background noise

37.5%

41.7%

62.5%

37.5%

Robustness low quality phone

25.0%

100.0%

100.0%

50.0%

Robustness recording condition

0.0%

50.0%

50.0%

50.0%

Robustness simulated recording condition

33.3%

16.7%

33.3%

16.7%

Robustness small changes

90.0%

90.0%

100.0%

85.0%

Robustness spectral tilt

62.5%

87.5%

75.0%

62.5%