intra_class_correlation()

audpsychometric.intra_class_correlation(ratings, *, axis=1, icc_type='ICC_1_1', anova_method='pingouin')[source]

Intraclass Correlation.

Intraclass correlation calculates rating reliability by relating (i) variability of different ratings of the same subject to (ii) the total variation across all ratings and all items.

The model is based on analysis of variance, and ratings must at least be ordinally scaled.

CCC is conceptually and numerically related to the ICC. For an implementation see audmetric.concordance_cc().

Parameters
  • ratings (Sequence) – ratings. When given as a 1-dimensional array, it is treated as a row vector

  • axis (int) – axis along which the rater confidence is computed. A value of 1 assumes stimuli as rows

  • icc_type (str) – ICC Method, see description below

  • anova_method (str) – method for ANOVA calculation, can be "pingouin" or "statsmodels"

Return type

Tuple[float, Dict]

Returns

icc and additional results lumped into dict

Notes

Shrout and Fleiss [SF79] describe a whole family of forms of coefficients that must be selected according to following criteria Koo and Li [KL16]:

Parametrization

Koo and Li [KL16] detail the parametrization:

Rater type: 1 versus k raters:

“This selection depends on how the measurement protocol will be conducted in actual application. For instance, if we plan to use the mean value of 3 raters as an assessment basis, the experimental design of the reliability study should involve 3 raters, and the ‘mean of k raters’ type should be selected.

Rating definition: absolute or consistency

“For both 2-way random- and 2-way mixed-effects models, there are 2 ICC definitions: “absolute agreement” and “consistency.” Selection of the ICC definition depends on whether we consider absolute agreement or consistency between raters to be more important. Absolute agreement concerns if different raters assign the same score to the same subject. Conversely, consistency definition concerns if raters’ scores to the same group of subjects are correlated in an additive manner. Consider an interrater reliability study of 2 raters as an example. In this case, consistency definition concerns the degree to which one rater’s score (y) can be equated to another rater’s score (x) plus a systematic error (c) (ie, \(y = x + c\)), whereas absolute agreement concerns about the extent to which y equals x.

Model Choice

  • One-Way Random-Effects Model

    In this model, each subject is rated by a different set of raters who were randomly chosen from a larger population of possible raters. This is used when different sets of raters rate different items

  • Two-Way Random-Effects Model

    If we randomly select our raters from a larger population of raters with similar characteristics

  • Two-Way Mixed-Effects Model

    We should use the 2-way mixed-effects model if the selected raters are the only raters of interest

Parametrization

Rater Type

Rating definition

Model Choice

ICC(1,1) - icc_1_1

single

absolute agreement

One-way random effects

ICC(2,1) - icc_2_1

single

absolute agreement

Two-way random effects

ICC(3,1) - icc_3_1

single

consistency

Two-way mixed effects

ICC(1,k) - icc_2_k

k

absolute agreement

One-way random effects

ICC(2,k) - icc_2_k

k

absolute agreement

Two-way random effects

ICC(3,k) - icc_3_k

k

consistency

Two-way mixed effects

Interpretation

Interpretation Conventions vary a between several sources, and depend also on the use case. We list values from the papers of Hallgren [Hal12] and Koo and Li [KL16] respectively:

Value

Interpretation

< 0.4

poor

.40 - .59

fair

.60 - .74

good

.75 - 1.0

excellent

Value

Interpretation

< 0.5

poor

.50 - .75

moderate

.75 - .9

good

>.9 - 1.0

excellent

Formulas:

Shrout & Fleiss formula (1979, p. 423ff):

\[\text{ICC}(1,1) = \frac{(\text{bms} - \text{wms})}{ (\text{bms} + (k - 1) * \text{wms})}\]
\[\text{ICC}(2,1) = \frac{(\text{bms} - \text{ems})}{ (\text{bms} + (k - 1) * \text{ems} + k * (\text{jms} - \text{ems}) / n)}\]
\[\text{ICC}(3,1) = \frac {(\text{bms} - \text{ems})}{ (\text{bms} + (k - 1) * \text{ems})}\]
\[\text{ICC}(1,k) = \frac{(\text{bms} - \text{wms})}{\text{bms}}\]
\[\text{ICC}(2,k) = \frac{(\text{bms} - \text{ems})}{ (\text{bms} + (\text{jms} - \text{ems}) / n)}\]
\[\text{ICC}(3,k) = \frac{(\text{bms} - \text{ems})}{\text{bms}}\]

where

  • \(\text{bms}\) is the between items (“targets”) mean sqeare of the underlying 2-factor Anova

  • \(\text{wms}\) is the within items (“target”) mean square

  • \(\text{jms}\) is the between raters (“judges”) mean square of the underlying 2-factor Anova

  • \(\text{ems}\) is the error/residual mean square

  • \(k\) is the number of raters

  • \(n\) is the number of items