word_error_rate()¶
- audmetric.word_error_rate(truth, prediction, *, norm='truth')[source]¶
Word error rate based on edit distance.
The word error rate is computed by aggregating the normalized edit distances of each (truth, prediction)-pair and averaging the aggregated score by the number of pairs.
The normalized edit distance of each (truth, prediction)-pair is computed as the edit distance divided by a normalization factor n. This represents the average editing cost per sequence item. The value of n depends on the
normparameter.If
normis"truth", n is set to the reference (truth) length, following the Wikipedia formulation. Here, n is the number of words in the reference. This means WER can be greater than 1 if the prediction sequence is longer than the reference:If
normis"longest", n is set to the maximum length between truth and prediction:- Parameters
truth (
Sequence[Sequence[str]]) – ground truth stringsprediction (
Sequence[Sequence[str]]) – predicted stringsnorm (
str) – normalization method, either “truth” or “longest”. “truth” normalizes by truth length, “longest” normalizes by max length of truth and prediction
- Return type
float- Returns
word error rate
- Raises
ValueError – if
truthandpredictiondiffer in lengthValueError – if
normis not one of"truth","longest"
Examples
>>> truth = [["lorem", "ipsum"], ["north", "wind", "and", "sun"]] >>> prediction = [["lorm", "ipsum"], ["north", "wind"]] >>> word_error_rate(truth, prediction) 0.5 >>> truth = [["hello", "world"]] >>> prediction = [["xyz", "moon", "abc"]] >>> word_error_rate(truth, prediction) 1.5 >>> word_error_rate(truth, prediction, norm="longest") 1.0