Correctness consistency

Overall scores

w2v2-b

w2v2-L

w2v2-L-robust

w2v2-L-xls-r

w2v2-L-vox

Overall Score

46.8% passed tests (22 passed / 25 failed).

59.6% passed tests (28 passed / 19 failed).

42.6% passed tests (20 passed / 27 failed).

59.6% passed tests (28 passed / 19 failed).

68.1% passed tests (32 passed / 15 failed).

Samples In Expected High Range

Proportion of samples whose predictions fall into the expected value range of >= 0.55

Threshold: 0.75

Data

happiness

w2v2-b

w2v2-L

w2v2-L-robust

w2v2-L-xls-r

w2v2-L-vox

crema-d-1.2.0-emotion.categories.test.gold_standard

0.15

0.10

0.48

0.05

0.18

danish-emotional-speech-1.1.1-emotion.test

0.54

0.04

0.29

0.12

0.12

emodb-1.2.0-emotion.categories.test.gold_standard

0.15

0.19

0.48

0.00

0.04

emovo-1.2.1-emotion.test

0.11

0.05

0.49

0.02

0.01

iemocap-2.3.0-emotion.categories.test.gold_standard

0.52

0.40

0.51

0.38

0.44

meld-1.3.1-emotion.categories.test.gold_standard

0.65

0.68

0.70

0.62

0.58

polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard

0.52

0.42

0.80

0.18

0.18

ravdess-1.1.2-emotion.speech.test

0.00

0.00

0.06

0.00

0.00

mean

0.33

0.23

0.48

0.17

0.19

Samples In Expected Low Range

Proportion of samples whose predictions fall into the expected value range of <= 0.45

Threshold: 0.75

Data

anger

disgust

fear

frustration

sadness

w2v2-b

w2v2-L

w2v2-L-robust

w2v2-L-xls-r

w2v2-L-vox

w2v2-b

w2v2-L

w2v2-L-robust

w2v2-L-xls-r

w2v2-L-vox

w2v2-b

w2v2-L

w2v2-L-robust

w2v2-L-xls-r

w2v2-L-vox

w2v2-b

w2v2-L

w2v2-L-robust

w2v2-L-xls-r

w2v2-L-vox

w2v2-b

w2v2-L

w2v2-L-robust

w2v2-L-xls-r

w2v2-L-vox

crema-d-1.2.0-emotion.categories.test.gold_standard

0.93

0.99

0.77

0.98

0.95

0.88

1.00

0.71

0.99

0.92

0.89

0.99

0.67

0.98

0.96

0.93

1.00

0.72

1.00

0.93

danish-emotional-speech-1.1.1-emotion.test

0.50

0.98

0.58

0.92

0.94

0.75

0.98

0.79

0.92

1.00

emodb-1.2.0-emotion.categories.test.gold_standard

0.84

1.00

0.49

1.00

1.00

0.69

0.96

0.38

1.00

0.92

0.64

0.88

0.27

0.94

0.91

1.00

1.00

0.85

1.00

1.00

emovo-1.2.1-emotion.test

0.85

1.00

0.54

0.99

1.00

0.55

0.94

0.29

0.87

0.88

0.62

0.95

0.39

0.87

0.98

0.74

1.00

0.49

0.86

0.95

iemocap-2.3.0-emotion.categories.test.gold_standard

0.70

0.72

0.74

0.76

0.79

0.65

0.65

0.76

0.41

0.59

0.58

0.58

0.68

0.61

0.68

0.79

0.86

0.86

0.84

0.85

meld-1.3.1-emotion.categories.test.gold_standard

0.42

0.37

0.48

0.45

0.49

0.30

0.22

0.40

0.28

0.42

0.26

0.18

0.24

0.16

0.30

0.41

0.35

0.56

0.34

0.50

polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard

0.68

0.85

0.35

0.88

0.92

0.45

0.55

0.28

0.60

0.92

0.82

0.55

0.32

0.52

0.92

ravdess-1.1.2-emotion.speech.test

1.00

1.00

1.00

1.00

1.00

1.00

1.00

0.97

1.00

1.00

0.94

0.97

0.94

1.00

1.00

0.97

1.00

0.91

0.97

1.00

mean

0.74

0.86

0.62

0.87

0.89

0.68

0.82

0.55

0.83

0.83

0.64

0.74

0.51

0.71

0.81

0.58

0.58

0.68

0.61

0.68

0.80

0.84

0.69

0.81

0.89

Samples In Expected Neutral Range

Proportion of samples whose predictions fall into the expected value range of [0.3, 0.6]

Threshold: 0.75

Data

boredom

neutral

w2v2-b

w2v2-L

w2v2-L-robust

w2v2-L-xls-r

w2v2-L-vox

w2v2-b

w2v2-L

w2v2-L-robust

w2v2-L-xls-r

w2v2-L-vox

crema-d-1.2.0-emotion.categories.test.gold_standard

0.88

0.78

0.97

0.72

0.87

danish-emotional-speech-1.1.1-emotion.test

0.96

0.81

0.96

0.90

0.85

emodb-1.2.0-emotion.categories.test.gold_standard

0.97

1.00

1.00

1.00

1.00

0.96

1.00

1.00

1.00

1.00

emovo-1.2.1-emotion.test

0.96

0.92

0.99

0.76

0.86

iemocap-2.3.0-emotion.categories.test.gold_standard

0.86

0.87

0.89

0.90

0.93

meld-1.3.1-emotion.categories.test.gold_standard

0.62

0.72

0.79

0.72

0.78

polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard

1.00

1.00

1.00

1.00

1.00

0.95

1.00

1.00

1.00

1.00

ravdess-1.1.2-emotion.speech.test

1.00

0.69

1.00

0.62

0.44

mean

0.98

1.00

1.00

1.00

1.00

0.90

0.85

0.95

0.83

0.84

Visualization

Distribution of dimensional model predictions for samples with different categorical emotions. The expected range of model predictions is highlighted by the green brackground.

w2v2-b

w2v2-L

w2v2-L-robust

w2v2-L-xls-r

w2v2-L-vox

../../../_images/visualization_crema-d-1.2.0-emotion.categories.test.gold_standard67.png
../../../_images/visualization_crema-d-1.2.0-emotion.categories.test.gold_standard70.png
../../../_images/visualization_crema-d-1.2.0-emotion.categories.test.gold_standard71.png
../../../_images/visualization_crema-d-1.2.0-emotion.categories.test.gold_standard72.png
../../../_images/visualization_crema-d-1.2.0-emotion.categories.test.gold_standard73.png
../../../_images/visualization_danish-emotional-speech-1.1.1-emotion.test45.png
../../../_images/visualization_danish-emotional-speech-1.1.1-emotion.test48.png
../../../_images/visualization_danish-emotional-speech-1.1.1-emotion.test49.png
../../../_images/visualization_danish-emotional-speech-1.1.1-emotion.test50.png
../../../_images/visualization_danish-emotional-speech-1.1.1-emotion.test51.png
../../../_images/visualization_emodb-1.2.0-emotion.categories.test.gold_standard45.png
../../../_images/visualization_emodb-1.2.0-emotion.categories.test.gold_standard48.png
../../../_images/visualization_emodb-1.2.0-emotion.categories.test.gold_standard49.png
../../../_images/visualization_emodb-1.2.0-emotion.categories.test.gold_standard50.png
../../../_images/visualization_emodb-1.2.0-emotion.categories.test.gold_standard51.png
../../../_images/visualization_emovo-1.2.1-emotion.test67.png
../../../_images/visualization_emovo-1.2.1-emotion.test70.png
../../../_images/visualization_emovo-1.2.1-emotion.test71.png
../../../_images/visualization_emovo-1.2.1-emotion.test72.png
../../../_images/visualization_emovo-1.2.1-emotion.test73.png
../../../_images/visualization_iemocap-2.3.0-emotion.categories.test.gold_standard67.png
../../../_images/visualization_iemocap-2.3.0-emotion.categories.test.gold_standard70.png
../../../_images/visualization_iemocap-2.3.0-emotion.categories.test.gold_standard71.png
../../../_images/visualization_iemocap-2.3.0-emotion.categories.test.gold_standard72.png
../../../_images/visualization_iemocap-2.3.0-emotion.categories.test.gold_standard73.png
../../../_images/visualization_meld-1.3.1-emotion.categories.test.gold_standard89.png
../../../_images/visualization_meld-1.3.1-emotion.categories.test.gold_standard92.png
../../../_images/visualization_meld-1.3.1-emotion.categories.test.gold_standard93.png
../../../_images/visualization_meld-1.3.1-emotion.categories.test.gold_standard94.png
../../../_images/visualization_meld-1.3.1-emotion.categories.test.gold_standard95.png
../../../_images/visualization_polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard45.png
../../../_images/visualization_polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard48.png
../../../_images/visualization_polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard49.png
../../../_images/visualization_polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard50.png
../../../_images/visualization_polish-emotional-speech-1.1.1-emotion.categories.test.gold_standard51.png
../../../_images/visualization_ravdess-1.1.2-emotion.speech.test45.png
../../../_images/visualization_ravdess-1.1.2-emotion.speech.test48.png
../../../_images/visualization_ravdess-1.1.2-emotion.speech.test49.png
../../../_images/visualization_ravdess-1.1.2-emotion.speech.test50.png
../../../_images/visualization_ravdess-1.1.2-emotion.speech.test51.png