event_confusion_matrix()

audmetric.event_confusion_matrix(truth, prediction, labels=None, *, onset_tolerance=0.0, offset_tolerance=0.0, duration_tolerance=None, normalize=False)[source]

Event-based confusion.

This metric compares not only the labels of prediction and ground truth, but also the time windows they occur in.

Each event is considered to be correctly identified if the predicted label is the same as the ground truth label, and if the onset is within the given onset_tolerance (in seconds) and the offset is within the given offset_tolerance (in seconds). Additionally to the offset_tolerance, one can also specify the duration_tolerance, to ensure that the offset occurs within a certain proportion of the reference event duration. If a prediction fulfills the duration_tolerance but not the offset_tolerance (or vice versa), it is still considered to be an overlapping segment. 1

The resulting confusion matrix has one more row and and one more column than there are labels. The last row/column corresponds to the absence of any event. This allows to distinguish between segments that overlap but have differing labels, and false negatives that have no overlapping predicted segment as well as false positives that have no overlapping ground truth segment.

1

Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. Metrics for polyphonic sound event detection. Applied Sciences, 2016. doi:10.3390/app6060162.

Parameters
  • truth (Series) – ground truth labels with a segmented index conform to audformat

  • prediction (Series) – predicted labels with a segmented index conform to audformat

  • labels (Optional[Sequence[object]]) – included labels in preferred ordering. If no labels are supplied, they will be inferred from {prediction,truth}\{\text{prediction}, \text{truth}\} and ordered alphabetically

  • onset_tolerance (Optional[float]) – the onset tolerance in seconds. If the predicted segment’s onset does not occur within this time window compared to the ground truth segment’s onset, it is not considered correct

  • offset_tolerance (Optional[float]) – the offset tolerance in seconds. If the predicted segment’s offset does not occur within this time window compared to the ground truth segment’s offset, it is not considered correct, unless the duration_tolerance is specified and fulfilled

  • duration_tolerance (Optional[float]) – the duration tolerance as a measure of proportion of the ground truth segment’s total duration. If the offset_tolerance is not fulfilled, and the predicted segment’s offset does not occur within this time window compared to the ground truth segment’s offset, it is not considered correct

  • normalize (bool) – normalize confusion matrix over the rows

Return type

list[list[int | float]]

Returns

event confusion matrix

Raises

ValueError – if truth or prediction do not have a segmented index conform to audformat

Examples

>>> import pandas as pd
>>> import audformat
>>> truth = pd.Series(
...     index=audformat.segmented_index(
...         files=["f1.wav"] * 4,
...         starts=[0, 0.1, 0.2, 0.3],
...         ends=[0.1, 0.2, 0.3, 0.4],
...     ),
...     data=["a", "a", "b", "b"],
... )
>>> prediction = pd.Series(
...     index=audformat.segmented_index(
...         files=["f1.wav"] * 4 + ["f2.wav"],
...         starts=[0, 0.09, 0.2, 0.31, 0.0],
...         ends=[0.1, 0.2, 0.3, 0.41, 1.0],
...     ),
...     data=["a", "b", "a", "b", "b"],
... )
>>> event_confusion_matrix(
...     truth, prediction, onset_tolerance=0.02, offset_tolerance=0.02
... )
[[1, 1, 0], [1, 1, 0], [0, 1, 0]]