w2v2-L vs. hubert-L vs. wavlm vs. data2vec

This compares the models w2v2-L and hubert-L and wavlm and data2vec to one another.

Tests overview

Topic

Passed Tests

w2v2-L

hubert-L

wavlm

data2vec

Overall Score

81.7% (294 passed / 66 failed)

84.7% (305 passed / 55 failed)

85.1% (338 passed / 59 failed)

82.1% (321 passed / 70 failed)

Correctness consistency

55.8%

51.2%

65.1%

48.8%

Correctness distribution

66.7%

66.7%

33.3%

66.7%

Correctness regression

66.7%

55.6%

55.6%

66.7%

Correctness speaker average

100.0%

100.0%

100.0%

100.0%

Correctness speaker ranking

100.0%

100.0%

100.0%

100.0%

Fairness accent

93.5%

100.0%

96.0%

96.8%

Fairness language

66.7%

87.5%

86.7%

83.3%

Fairness linguistic sentiment

97.2%

91.7%

84.7%

87.5%

Fairness pitch

93.3%

100.0%

93.3%

93.3%

Fairness sex

96.9%

96.9%

96.9%

90.6%

Robustness background noise

50.0%

54.2%

62.5%

54.2%

Robustness low quality phone

25.0%

75.0%

100.0%

75.0%

Robustness recording condition

0.0%

50.0%

0.0%

50.0%

Robustness simulated recording condition

16.7%

16.7%

50.0%

16.7%

Robustness small changes

90.0%

95.0%

100.0%

90.0%

Robustness spectral tilt

87.5%

100.0%

75.0%

62.5%