w2v2-L vs. hubert-L vs. wavlm vs. data2vec

This compares the models w2v2-L and hubert-L and wavlm and data2vec to one another.

Tests overview

Topic

Passed Tests

w2v2-L

hubert-L

wavlm

data2vec

Overall Score

81.0% (295 passed / 69 failed)

78.8% (287 passed / 77 failed)

83.4% (336 passed / 67 failed)

76.9% (299 passed / 90 failed)

Correctness consistency

59.6%

36.2%

55.3%

29.8%

Correctness distribution

33.3%

66.7%

33.3%

66.7%

Correctness regression

0.0%

22.2%

44.4%

33.3%

Correctness speaker average

100.0%

100.0%

100.0%

100.0%

Correctness speaker ranking

0.0%

50.0%

50.0%

50.0%

Fairness accent

96.8%

100.0%

100.0%

100.0%

Fairness language

100.0%

83.3%

83.3%

61.1%

Fairness linguistic sentiment

100.0%

88.8%

89.8%

78.8%

Fairness pitch

100.0%

100.0%

100.0%

100.0%

Fairness sex

100.0%

100.0%

100.0%

100.0%

Robustness background noise

16.7%

37.5%

33.3%

37.5%

Robustness low quality phone

75.0%

50.0%

75.0%

50.0%

Robustness recording condition

0.0%

50.0%

50.0%

100.0%

Robustness simulated recording condition

16.7%

0.0%

33.3%

16.7%

Robustness small changes

80.0%

95.0%

85.0%

85.0%

Robustness spectral tilt

75.0%

100.0%

100.0%

100.0%