intra_class_correlation()¶
- audpsychometric.intra_class_correlation(ratings, *, axis=1, icc_type='ICC_1_1', anova_method='pingouin')[source]¶
Intraclass Correlation.
Intraclass correlation calculates rating reliability by relating (i) variability of different ratings of the same subject to (ii) the total variation across all ratings and all items.
The model is based on analysis of variance, and ratings must at least be ordinally scaled.
CCC is conceptually and numerically related to the ICC. For an implementation see
audmetric.concordance_cc()
.- Parameters
ratings (
Sequence
) – ratings. When given as a 1-dimensional array, it is treated as a row vectoraxis (
int
) – axis along which the rater confidence is computed. A value of1
assumes stimuli as rowsicc_type (
str
) – ICC Method, see description belowanova_method (
str
) – method for ANOVA calculation, can be"pingouin"
or"statsmodels"
Notes
Shrout and Fleiss [SF79] describe a whole family of forms of coefficients that must be selected according to following criteria Koo and Li [KL16]:
Parametrization
Koo and Li [KL16] detail the parametrization:
Rater type: 1 versus k raters:
“This selection depends on how the measurement protocol will be conducted in actual application. For instance, if we plan to use the mean value of 3 raters as an assessment basis, the experimental design of the reliability study should involve 3 raters, and the ‘mean of k raters’ type should be selected.
Rating definition: absolute or consistency
“For both 2-way random- and 2-way mixed-effects models, there are 2 ICC definitions: “absolute agreement” and “consistency.” Selection of the ICC definition depends on whether we consider absolute agreement or consistency between raters to be more important. Absolute agreement concerns if different raters assign the same score to the same subject. Conversely, consistency definition concerns if raters’ scores to the same group of subjects are correlated in an additive manner. Consider an interrater reliability study of 2 raters as an example. In this case, consistency definition concerns the degree to which one rater’s score (y) can be equated to another rater’s score (x) plus a systematic error (c) (ie, \(y = x + c\)), whereas absolute agreement concerns about the extent to which y equals x.
Model Choice
- One-Way Random-Effects Model
In this model, each subject is rated by a different set of raters who were randomly chosen from a larger population of possible raters. This is used when different sets of raters rate different items
- Two-Way Random-Effects Model
If we randomly select our raters from a larger population of raters with similar characteristics
- Two-Way Mixed-Effects Model
We should use the 2-way mixed-effects model if the selected raters are the only raters of interest
Parametrization
Rater Type
Rating definition
Model Choice
ICC(1,1) - icc_1_1
single
absolute agreement
One-way random effects
ICC(2,1) - icc_2_1
single
absolute agreement
Two-way random effects
ICC(3,1) - icc_3_1
single
consistency
Two-way mixed effects
ICC(1,k) - icc_2_k
k
absolute agreement
One-way random effects
ICC(2,k) - icc_2_k
k
absolute agreement
Two-way random effects
ICC(3,k) - icc_3_k
k
consistency
Two-way mixed effects
Interpretation
Interpretation Conventions vary a between several sources, and depend also on the use case. We list values from the papers of Hallgren [Hal12] and Koo and Li [KL16] respectively:
Value
Interpretation
< 0.4
poor
.40 - .59
fair
.60 - .74
good
.75 - 1.0
excellent
Value
Interpretation
< 0.5
poor
.50 - .75
moderate
.75 - .9
good
>.9 - 1.0
excellent
Formulas:
Shrout & Fleiss formula (1979, p. 423ff):
\[\text{ICC}(1,1) = \frac{(\text{bms} - \text{wms})}{ (\text{bms} + (k - 1) * \text{wms})}\]\[\text{ICC}(2,1) = \frac{(\text{bms} - \text{ems})}{ (\text{bms} + (k - 1) * \text{ems} + k * (\text{jms} - \text{ems}) / n)}\]\[\text{ICC}(3,1) = \frac {(\text{bms} - \text{ems})}{ (\text{bms} + (k - 1) * \text{ems})}\]\[\text{ICC}(1,k) = \frac{(\text{bms} - \text{wms})}{\text{bms}}\]\[\text{ICC}(2,k) = \frac{(\text{bms} - \text{ems})}{ (\text{bms} + (\text{jms} - \text{ems}) / n)}\]\[\text{ICC}(3,k) = \frac{(\text{bms} - \text{ems})}{\text{bms}}\]where
\(\text{bms}\) is the between items (“targets”) mean sqeare of the underlying 2-factor Anova
\(\text{wms}\) is the within items (“target”) mean square
\(\text{jms}\) is the between raters (“judges”) mean square of the underlying 2-factor Anova
\(\text{ems}\) is the error/residual mean square
\(k\) is the number of raters
\(n\) is the number of items