audmetric¶
accuracy¶
- audmetric.accuracy(truth, prediction, labels=None)[source]¶
Classification accuracy.
\[\text{accuracy} = \frac{\text{number of correct predictions}} {\text{number of total predictions}}\]- Parameters
truth (
Sequence
[Any
]) – ground truth values/classesprediction (
Sequence
[Any
]) – predicted values/classeslabels (
Optional
[Sequence
[Union
[str
,int
]]]) – included labels in preferred ordering. Sample is considered in computation if either prediction or ground truth (logical OR) is contained in labels. If no labels are supplied, they will be inferred from \(\{\text{prediction}, \text{truth}\}\) and ordered alphabetically.
- Return type
float
- Returns
accuracy of prediction \(\in [0, 1]\)
- Raises
ValueError – if
truth
andprediction
differ in length
Example
>>> accuracy([0, 0], [0, 1]) 0.5
concordance_cc¶
- audmetric.concordance_cc(truth, prediction)[source]¶
Concordance correlation coefficient.
\[\rho_c = \frac{2\rho\sigma_\text{prediction}\sigma_\text{truth}} {\sigma_\text{prediction}^2 + \sigma_\text{truth}^2 + ( \mu_\text{prediction}-\mu_\text{truth})^2}\]where \(\rho\) is the Pearson correlation coefficient, \(\mu\) the mean and \(\sigma^2\) the variance.1
- 1
Lawrence I-Kuei Lin. A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45:255–268, 1989. doi:10.2307/2532051.
- Parameters
truth (
Sequence
[float
]) – ground truth valuesprediction (
Sequence
[float
]) – predicted values
- Return type
float
- Returns
concordance correlation coefficient \(\in [-1, 1]\)
- Raises
ValueError – if
truth
andprediction
differ in length
Example
>>> concordance_cc([0, 1, 2], [0, 1, 1]) 0.6666666666666666
confusion_matrix¶
- audmetric.confusion_matrix(truth, prediction, labels=None, *, normalize=False)[source]¶
Confusion matrix.
- Parameters
truth (
Sequence
[Any
]) – ground truth values/classesprediction (
Sequence
[Any
]) – predicted values/classeslabels (
Optional
[Sequence
[Any
]]) – included labels in preferred ordering. If no labels are supplied, they will be inferred from \(\{\text{prediction}, \text{truth}\}\) and ordered alphabetically.normalize (
bool
) – normalize confusion matrix over the rows
- Return type
List
[List
[Union
[int
,float
]]]- Returns
confusion matrix
- Raises
ValueError – if
truth
andprediction
differ in length
Example
>>> truth = [0, 1, 2] >>> prediction = [0, 2, 0] >>> confusion_matrix(truth, prediction) [[1, 0, 0], [0, 0, 1], [1, 0, 0]]
detection_error_tradeoff¶
- audmetric.detection_error_tradeoff(truth, prediction)[source]¶
Detection error tradeoff for verification experiments.
The detection error tradeoff (DET) is a graph showing the false non-match rate (FNMR) against the false match rate (FMR). The FNMR indicates how often an enrolled speaker was missed. The FMR indicates how often an impostor was verified as the enrolled speaker.
This function does not return a figure, but the FMR and FNMR, together with the corresponding verification thresholds at which a similarity value was regarded to belong to the enrolled speaker.
truth
may only contain entries like[1, 0, True, False...]
, whereas prediction values can also contain similarity scores, e.g.[0.8, 0.1, ...]
.The implementation is identical with the one provided by the pyeer package.
- Parameters
truth (
Union
[bool
,int
,Sequence
[Union
[bool
,int
]]]) – ground truth classesprediction (
Union
[bool
,int
,float
,Sequence
[Union
[bool
,int
,float
]]]) – predicted classes or similarity scores
- Return type
Tuple
[ndarray
,ndarray
,ndarray
]- Returns
false match rate (FMR)
false non-match rate (FNMR)
verification thresholds
- Raises
ValueError – if
truth
contains values different from1, 0, True, False
Example
>>> truth = [1, 0] >>> prediction = [0.9, 0.1] >>> detection_error_tradeoff(truth, prediction) (array([1., 0.]), array([0., 0.]), array([0.1, 0.9]))
edit_distance¶
- audmetric.edit_distance(truth, prediction)[source]¶
Edit distance between two strings of characters or sequences of ints.
The implementation follows the Wagner-Fischer algorithm.
- Parameters
truth (
Union
[str
,Sequence
[int
]]) – ground truth sequenceprediction (
Union
[str
,Sequence
[int
]]) – predicted sequence
- Return type
int
- Returns
edit distance
Example
>>> truth = 'lorem' >>> prediction = 'lorm' >>> edit_distance(truth, prediction) 1 >>> truth = [0, 1, 2] >>> prediction = [0, 1] >>> edit_distance(truth, prediction) 1
equal_error_rate¶
- audmetric.equal_error_rate(truth, prediction)[source]¶
Equal error rate for verification tasks.
The equal error rate (EER) is the point where false non-match rate (FNMR) and the impostors or false match rate (FMR) are identical. The FNMR indicates how often an enrolled speaker was missed. The FMR indicates how often an impostor was verified as the enrolled speaker.
In practice the score distribution is not continuous and an interval is returned instead. The EER value will be set as the midpoint of this interval:2
\[\text{EER} = \frac{ \min(\text{FNMR}[t], \text{FMR}[t]) + \max(\text{FNMR}[t], \text{FMR}[t]) }{2}\]with \(t = \text{argmin}(|\text{FNMR} - \text{FMR}|)\).
truth
may only contain entries like[1, 0, True, False...]
, whereas prediction values can also contain similarity scores, e.g.[0.8, 0.1, ...]
.The implementation is identical with the one provided by the pyeer package.
- 2
D. Maio, D. Maltoni, R. Cappelli, J. L. Wayman, and A. K. Jain. Fvc2000: fingerprint verification competition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:402–412, 2002. doi:10.1109/34.990140.
- Parameters
truth (
Union
[bool
,int
,Sequence
[Union
[bool
,int
]]]) – ground truth classesprediction (
Union
[bool
,int
,float
,Sequence
[Union
[bool
,int
,float
]]]) – predicted classes or similarity scores
- Return type
Tuple
[float
,namedtuple
]- Returns
equal error rate (EER)
namedtuple containing
fmr
,fnmr
,thresholds
,threshold
whereas the last one corresponds to the threshold corresponding to the returned EER
- Raises
ValueError – if
truth
contains values different from1, 0, True, False
Example
>>> truth = [0, 1, 0, 1, 0] >>> prediction = [0.2, 0.8, 0.4, 0.5, 0.5] >>> eer, stats = equal_error_rate(truth, prediction) >>> eer 0.16666666666666666 >>> stats.threshold 0.5
event_error_rate¶
- audmetric.event_error_rate(truth, prediction)[source]¶
Event error rate based on edit distance.
The event error rate is computed by aggregating the mean edit distances of each (truth, prediction)-pair and averaging the aggregated score by the number of pairs.
The mean edit distance of each (truth, prediction)-pair is computed as an average of the edit distance over the length of the longer sequence of the corresponding pair. By normalizing over the longer sequence the normalized distance is bound to [0, 1].
- Parameters
truth (
Union
[str
,Sequence
[Union
[str
,Sequence
[int
]]]]) – ground truth classesprediction (
Union
[str
,Sequence
[Union
[str
,Sequence
[int
]]]]) – predicted classes
- Return type
float
- Returns
event error rate
- Raises
ValueError – if
truth
andprediction
differ in length
Example
>>> event_error_rate([[0, 1]], [[0]]) 0.5 >>> event_error_rate([[0, 1], [2]], [[0], [2]]) 0.25 >>> event_error_rate(['lorem'], ['lorm']) 0.2 >>> event_error_rate(['lorem', 'ipsum'], ['lorm', 'ipsum']) 0.1
mean_absolute_error¶
- audmetric.mean_absolute_error(truth, prediction)[source]¶
Mean absolute error.
\[\text{MAE} = \frac{1}{n} \sum^n_{i=1} |\text{prediction} - \text{truth}|\]- Parameters
truth (
Sequence
[float
]) – ground truth valuesprediction (
Sequence
[float
]) – predicted values
- Return type
float
- Returns
mean absolute error
- Raises
ValueError – if
truth
andprediction
differ in length
Example
>>> mean_absolute_error([0, 0], [0, 1]) 0.5
mean_squared_error¶
- audmetric.mean_squared_error(truth, prediction)[source]¶
Mean squared error.
\[\text{MSE} = \frac{1}{n} \sum^n_{i=1} (\text{prediction} - \text{truth})^2\]- Parameters
truth (
Sequence
[float
]) – ground truth valuesprediction (
Sequence
[float
]) – predicted values
- Return type
float
- Returns
mean squared error
- Raises
ValueError – if
truth
andprediction
differ in length
Example
>>> mean_squared_error([0, 0], [0, 1]) 0.5
pearson_cc¶
- audmetric.pearson_cc(truth, prediction)[source]¶
Pearson correlation coefficient.
\[\rho = \frac{\text{cov}(\text{prediction}, \text{truth})}{ \sigma_\text{prediction}\sigma_\text{truth}}\]where \(\sigma\) is the standard deviation, and \(\text{cov}\) is the covariance.
- Parameters
truth (
Sequence
[float
]) – ground truth valuesprediction (
Sequence
[float
]) – predicted values
- Return type
float
- Returns
pearson correlation coefficient \(\in [-1, 1]\)
- Raises
ValueError – if
truth
andprediction
differ in length
Example
>>> pearson_cc([0, 1, 2], [0, 1, 1]) 0.8660254037844385
fscore_per_class¶
- audmetric.fscore_per_class(truth, prediction, labels=None, *, zero_division=0)[source]¶
F-score per class.
\[\text{fscore}_k = \frac{\text{true positive}_k} {\text{true positive}_k + \frac{1}{2} (\text{false positive}_k + \text{false negative}_k)}\]- Parameters
truth (
Sequence
[Any
]) – ground truth values/classesprediction (
Sequence
[Any
]) – predicted values/classeslabels (
Optional
[Sequence
[Any
]]) – included labels in preferred ordering. If no labels are supplied, they will be inferred from \(\{\text{prediction}, \text{truth}\}\) and ordered alphabetically.zero_division (
float
) – set the value to return when there is a zero division
- Return type
Dict
[str
,float
]- Returns
dictionary with label as key and F-score as value
Example
>>> fscore_per_class([0, 0], [0, 1]) {0: 0.6666666666666666, 1: 0.0}
precision_per_class¶
- audmetric.precision_per_class(truth, prediction, labels=None, *, zero_division=0)[source]¶
Precision per class.
\[\text{precision}_k = \frac{\text{true positive}_k} {\text{true positive}_k + \text{false positive}_k}\]- Parameters
truth (
Sequence
[Any
]) – ground truth values/classesprediction (
Sequence
[Any
]) – predicted values/classeslabels (
Optional
[Sequence
[Any
]]) – included labels in preferred ordering. If no labels are supplied, they will be inferred from \(\{\text{prediction}, \text{truth}\}\) and ordered alphabetically.zero_division (
float
) – set the value to return when there is a zero division
- Return type
Dict
[str
,float
]- Returns
dictionary with label as key and precision as value
Example
>>> precision_per_class([0, 0], [0, 1]) {0: 1.0, 1: 0.0}
recall_per_class¶
- audmetric.recall_per_class(truth, prediction, labels=None, *, zero_division=0)[source]¶
Recall per class.
\[\text{recall}_k = \frac{\text{true positive}_k} {\text{true positive}_k + \text{false negative}_k}\]- Parameters
truth (
Sequence
[Any
]) – ground truth values/classesprediction (
Sequence
[Any
]) – predicted values/classeslabels (
Optional
[Sequence
[Any
]]) – included labels in preferred ordering. If no labels are supplied, they will be inferred from \(\{\text{prediction}, \text{truth}\}\) and ordered alphabetically.zero_division (
float
) – set the value to return when there is a zero division
- Return type
Dict
[str
,float
]- Returns
dictionary with label as key and recall as value
Example
>>> recall_per_class([0, 0], [0, 1]) {0: 0.5, 1: 0.0}
unweighted_average_bias¶
- audmetric.unweighted_average_bias(truth, prediction, protected_variable, labels=None, *, subgroups=None, metric=<function fscore_per_class>, reduction=<function std>)[source]¶
Compute unweighted average bias with respect to a protected variable.
The bias is measured in terms of equalized odds which requires the classifier to have identical performance for all classes independent of a protected variable such as race. The performance of the classifier for its different classes can be assessed with standard metrics such as recall or precision. The difference in performance, denoted as score divergence, can be computed in different ways, as well. For two subgroups the (absolute) difference serves as a standard choice. For more than two subgroups the score divergence could be estimated by the standard deviation of the scores.
Note
If for a class less than two subgroups exhibit a performance score, the corresponding class is ignored in the bias computation. This occurs if there is no class sample for a subgroup, e.g. no negative (class label) female (subgroup of sex).
- Parameters
truth (
Sequence
[Any
]) – ground truth classesprediction (
Sequence
[Any
]) – predicted classesprotected_variable (
Sequence
[Any
]) – manifestations of protected variable such as subgroups “male” and “female” of variable “sex”labels (
Optional
[Sequence
[Any
]]) – included labels in preferred ordering. The bias is computed only on the specified labels. If no labels are supplied, they will be inferred from \(\{\text{prediction}, \text{truth}\}\) and ordered alphabetically.subgroups (
Optional
[Sequence
[Any
]]) – included subgroups in preferred ordering. The direction of the bias is determined by the ordering of the subgroups. Besides, the bias is computed only on the specified subgroups. If no subgroups are supplied, they will be inferred from \(\text{protected\_variable}\) and ordered alphanumerically.metric (
Callable
[[Sequence
[Any
],Sequence
[Any
],Optional
[Sequence
[str
]]],Dict
[str
,float
]]) – metric which equalized odds are measured with. Typical choices are:audmetric.recall_per_class()
,audmetric.precision_per_class()
oraudmetric.fscore_per_class()
reduction (
Callable
[[Sequence
[float
]],float
]) – specifies the reduction operation to measure the divergence between the scores of the subgroups of the protected variable for each class. Typical choices are: difference or absolute difference between scores for two subgroups and standard deviation of scores for more than two subgroups.
- Return type
float
- Returns
unweighted average bias
- Raises
ValueError – if
truth
,prediction
andprotected_variable
have different lengthsValueError – if
subgroups
contains values not contained inprotected_variable
Example
>>> unweighted_average_bias([1, 1], [1, 0], ['male', 'female']) 0.5 >>> unweighted_average_bias( ... [1, 1], [1, 0], ['male', 'female'], ... subgroups=['female', 'male'], ... reduction=lambda x: x[0] - x[1], ... ) -1.0 >>> unweighted_average_bias( ... [0, 1], [1, 0], ['male', 'female'], ... metric=recall_per_class ... ) nan >>> unweighted_average_bias( ... [0, 0, 0, 0], [1, 1, 0, 0], ['a', 'b', 'c', 'd'], ... metric=recall_per_class, ... ) 0.5
unweighted_average_fscore¶
- audmetric.unweighted_average_fscore(truth, prediction, labels=None, *, zero_division=0)[source]¶
Unweighted average F-score.
\[\text{UAF} = \frac{1}{K} \sum^K_{k=1} \frac{\text{true positive}_k} {\text{true positive}_k + \frac{1}{2} (\text{false positive}_k + \text{false negative}_k)}\]- Parameters
truth (
Sequence
[Any
]) – ground truth values/classesprediction (
Sequence
[Any
]) – predicted values/classeslabels (
Optional
[Sequence
[Any
]]) – included labels in preferred ordering. If no labels are supplied, they will be inferred from \(\{\text{prediction}, \text{truth}\}\) and ordered alphabetically.zero_division (
float
) – set the value to return when there is a zero division
- Return type
float
- Returns
unweighted average precision
Example
>>> unweighted_average_fscore([0, 0], [0, 1]) 0.3333333333333333
unweighted_average_precision¶
- audmetric.unweighted_average_precision(truth, prediction, labels=None, *, zero_division=0)[source]¶
Unweighted average precision.
\[\text{UAP} = \frac{1}{K} \sum^K_{k=1} \frac{\text{true positive}_k} {\text{true positive}_k + \text{false positive}_k}\]- Parameters
truth (
Sequence
[Any
]) – ground truth values/classesprediction (
Sequence
[Any
]) – predicted values/classeslabels (
Optional
[Sequence
[Any
]]) – included labels in preferred ordering. If no labels are supplied, they will be inferred from \(\{\text{prediction}, \text{truth}\}\) and ordered alphabetically.zero_division (
float
) – set the value to return when there is a zero division
- Return type
float
- Returns
unweighted average precision
Example
>>> unweighted_average_precision([0, 0], [0, 1]) 0.5
unweighted_average_recall¶
- audmetric.unweighted_average_recall(truth, prediction, labels=None, *, zero_division=0)[source]¶
Unweighted average recall.
\[\text{UAR} = \frac{1}{K} \sum^K_{k=1} \frac{\text{true positive}_k} {\text{true positive}_k + \text{false negative}_k}\]- Parameters
truth (
Sequence
[Any
]) – ground truth values/classesprediction (
Sequence
[Any
]) – predicted values/classeslabels (
Optional
[Sequence
[Any
]]) – included labels in preferred ordering. If no labels are supplied, they will be inferred from \(\{\text{prediction}, \text{truth}\}\) and ordered alphabetically.zero_division (
float
) – set the value to return when there is a zero division
- Return type
float
- Returns
unweighted average recall
Example
>>> unweighted_average_recall([0, 0], [0, 1]) 0.25
weighted_confusion_error¶
- audmetric.weighted_confusion_error(truth, prediction, weights, labels=None)[source]¶
Weighted confusion error.
Computes the normalized confusion matrix, applies given weights to each cell and sums them up. Weights are expected as positive numbers and will be normalized by the sum of all weights. The higher the weight, the more costly will be the error. A weight of 0 means that the cell is not taken into account for the error, this is usually the case for the diagonal as it holds correctly classified samples.
- Parameters
truth (
Sequence
[Any
]) – ground truth values/classesprediction (
Sequence
[Any
]) – predicted values/classesweights (
Sequence
[Sequence
[Union
[int
,float
]]]) – weights applied to the confusion matrix. Expected as a list of lists in the following form (r=row, c=column):[[w_r0_c0, ..., w_r0_cN], ..., [w_rN_c0, ..., w_rN_cN]]
labels (
Optional
[Sequence
[Any
]]) – included labels in preferred ordering. If no labels are supplied, they will be inferred from \(\{\text{prediction}, \text{truth}\}\) and ordered alphabetically.
- Return type
float
- Returns
weighted confusion error
Example
>>> truth = [0, 1, 2] >>> prediction = [0, 2, 0] >>> # penalize only errors > 1 >>> weights = [[0, 0 , 1], [0, 0, 0], [1, 0, 0]] >>> weighted_confusion_error(truth, prediction, weights) 0.5
word_error_rate¶
- audmetric.word_error_rate(truth, prediction)[source]¶
Word error rate based on edit distance.
- Parameters
truth (
Sequence
[Sequence
[str
]]) – ground truth stringsprediction (
Sequence
[Sequence
[str
]]) – predicted strings
- Return type
float
- Returns
word error rate
- Raises
ValueError – if
truth
andprediction
differ in length
Example
>>> truth = [['lorem', 'ipsum'], ['north', 'wind', 'and', 'sun']] >>> prediction = [['lorm', 'ipsum'], ['north', 'wind']] >>> word_error_rate(truth, prediction) 0.5