intra_class_correlation()¶

audpsychometric.intra_class_correlation(ratings, *, axis=1, icc_type='ICC_1_1', anova_method='pingouin')[source]¶

Intraclass Correlation.

Intraclass correlation calculates rating reliability by relating (i) variability of different ratings of the same subject to (ii) the total variation across all ratings and all items.

The model is based on analysis of variance, and ratings must at least be ordinally scaled.

CCC is conceptually and numerically related to the ICC. For an implementation see audmetric.concordance_cc().

Parameters

ratings (Sequence) – ratings. When given as a 1-dimensional array, it is treated as a row vector
axis (int) – axis along which the rater confidence is computed. A value of 1 assumes stimuli as rows
icc_type (str) – ICC Method, see description below
anova_method (str) – method for ANOVA calculation, can be "pingouin" or "statsmodels"

Return type: Tuple[float, Dict]
Returns: icc and additional results lumped into dict

Notes

Shrout and Fleiss [SF79] describe a whole family of forms of coefficients that must be selected according to following criteria Koo and Li [KL16]:

Parametrization

Koo and Li [KL16] detail the parametrization:

Rater type: 1 versus k raters:

“This selection depends on how the measurement protocol will be conducted in actual application. For instance, if we plan to use the mean value of 3 raters as an assessment basis, the experimental design of the reliability study should involve 3 raters, and the ‘mean of k raters’ type should be selected.

Rating definition: absolute or consistency

“For both 2-way random- and 2-way mixed-effects models, there are 2 ICC definitions: “absolute agreement” and “consistency.” Selection of the ICC definition depends on whether we consider absolute agreement or consistency between raters to be more important. Absolute agreement concerns if different raters assign the same score to the same subject. Conversely, consistency definition concerns if raters’ scores to the same group of subjects are correlated in an additive manner. Consider an interrater reliability study of 2 raters as an example. In this case, consistency definition concerns the degree to which one rater’s score (y) can be equated to another rater’s score (x) plus a systematic error (c) (ie, \(y = x + c\)), whereas absolute agreement concerns about the extent to which y equals x.

Model Choice

One-Way Random-Effects Model
In this model, each subject is rated by a different set of raters who were randomly chosen from a larger population of possible raters. This is used when different sets of raters rate different items
Two-Way Random-Effects Model
If we randomly select our raters from a larger population of raters with similar characteristics
Two-Way Mixed-Effects Model
We should use the 2-way mixed-effects model if the selected raters are the only raters of interest

Parametrization	Rater Type	Rating definition	Model Choice
ICC(1,1) - icc_1_1	single	absolute agreement	One-way random effects
ICC(2,1) - icc_2_1	single	absolute agreement	Two-way random effects
ICC(3,1) - icc_3_1	single	consistency	Two-way mixed effects
ICC(1,k) - icc_2_k	k	absolute agreement	One-way random effects
ICC(2,k) - icc_2_k	k	absolute agreement	Two-way random effects
ICC(3,k) - icc_3_k	k	consistency	Two-way mixed effects

Interpretation

Interpretation Conventions vary a between several sources, and depend also on the use case. We list values from the papers of Hallgren [Hal12] and Koo and Li [KL16] respectively:

Value	Interpretation
< 0.4	poor
.40 - .59	fair
.60 - .74	good
.75 - 1.0	excellent

Value	Interpretation
< 0.5	poor
.50 - .75	moderate
.75 - .9	good
>.9 - 1.0	excellent

Formulas:

Shrout & Fleiss formula (1979, p. 423ff):

\[\text{ICC}(1,1) = \frac{(\text{bms} - \text{wms})}{ (\text{bms} + (k - 1) * \text{wms})}\]

\[\text{ICC}(2,1) = \frac{(\text{bms} - \text{ems})}{ (\text{bms} + (k - 1) * \text{ems} + k * (\text{jms} - \text{ems}) / n)}\]

\[\text{ICC}(3,1) = \frac {(\text{bms} - \text{ems})}{ (\text{bms} + (k - 1) * \text{ems})}\]

\[\text{ICC}(1,k) = \frac{(\text{bms} - \text{wms})}{\text{bms}}\]

\[\text{ICC}(2,k) = \frac{(\text{bms} - \text{ems})}{ (\text{bms} + (\text{jms} - \text{ems}) / n)}\]

\[\text{ICC}(3,k) = \frac{(\text{bms} - \text{ems})}{\text{bms}}\]

where

\(\text{bms}\) is the between items (“targets”) mean sqeare of the underlying 2-factor Anova
\(\text{wms}\) is the within items (“target”) mean square
\(\text{jms}\) is the between raters (“judges”) mean square of the underlying 2-factor Anova
\(\text{ems}\) is the error/residual mean square
\(k\) is the number of raters
\(n\) is the number of items