Examples¶

Augmentation examples, showing you how to solve certain augmentation tasks.

Let’s start with loading an example file to augment.

import audb
import audiofile

files = audb.load_media(
    "emodb",
    "wav/03a01Fa.wav",
    version="1.4.1",
    verbose=False,
)
signal, sampling_rate = audiofile.read(files[0])

Recorded Reverb¶

Recorded reverb can be used to make machine learning models robust against changes of the room. In the following we use binaural impulse responses recorded with a dummy head from the air dataset. Its rir table holds recordings for four different rooms at different distances.

df = audb.load_table("air", "rir", version="1.4.2", verbose=False)
set(df.room)

{'booth', 'lecture', 'meeting', 'office'}

We load the left channel of all impulse responses stored in the rir table and resample them to 16000 Hz. We then randomly pick an impulse response during augmentation with auglib.observe.List.

auglib.seed(0)

db = audb.load(
    "air",
    version="1.4.2",
    tables="rir",
    channels=[0],
    sampling_rate=16000,
    verbose=False,
)
transform = auglib.transform.Compose(
    [
        auglib.transform.FFTConvolve(
            auglib.observe.List(db.files, draw=True),
            keep_tail=False,
        ),
        auglib.transform.NormalizeByPeak(),
    ]
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)

Artificial Reverb¶

If you don’t have enough examples of recorded reverb, or want to tune one particular parameter of reverb, you can artificially generate it. Pedalboard provides you a reverb transform, that let you adjust a bunch of parameters in the range 0 to 1. For more information on Pedalboard see the Pedalboard section. In the following, we simply pick all parameters randomly from a normal distribution.

auglib.seed(1)

def reverb(
        signal,
        sampling_rate,
        room_size,
        damping,
        wet_level,
        dry_level,
        width,
):
    r"""Reverb augmentation using pedalboard."""
    import pedalboard
    board = pedalboard.Pedalboard(
        [
            pedalboard.Reverb(
                room_size=room_size,
                damping=damping,
                wet_level=wet_level,
                dry_level=dry_level,
                width=width,
            ),
        ],
    )
    return board(signal, sampling_rate)

random_params = auglib.observe.FloatNorm(
    mean=0.5,
    std=0.5,
    minimum=0,
    maximum=1,
)
transform = auglib.transform.Compose(
    [
        auglib.transform.Function(
            reverb,
            function_args={
                "room_size": random_params,
                "damping": random_params,
                "wet_level": random_params,
                "dry_level": random_params,
                "width": random_params,
            },
        ),
        auglib.transform.NormalizeByPeak(),
    ]
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)

Music¶

Music can be added as a background signal during training of a machine learning model. We load a single music file from musan in this example. We recommend to use all media files from the music table, when using the augmentation in a real application. We randomly crop each music sample with repetition, attenuate it by -15 dB to -10 dB, and add it to the original input signal.

auglib.seed(0)

db = audb.load(
    "musan",
    tables="music",
    media="music/fma/music-fma-0097.wav",
    version="1.0.0",
    verbose=False,
)

transform = auglib.transform.Mix(
    auglib.observe.List(db.files, draw=True),
    gain_aux_db=auglib.observe.IntUni(-15, -10),
    read_pos_aux=auglib.observe.FloatUni(0, 1),
    unit="relative",
    loop_aux=True,
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)

Noise with fixed SNR¶

When adding noise to a signal during augmentation, it is often desired to let the noise level depend on the signal level to achieve a fixed signal-to-noise (SNR) between the two.

This can be achieved in auglib by the snr_db argument. The following example adds pink noise with a SNR of 10 dB to the input signal.

auglib.seed(0)

transform = auglib.transform.PinkNoise(snr_db=10)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)

Band-Pass Filtered Noise¶

Some augmentations like auglib.transform.WhiteNoiseGaussian generate augmentation signals that are added to the incoming signal. Those generated augmentation signals can be modified with the help of auglib.transform.Mix and its transform argument.

The following example adds band-pass filtered white noise to the input signal.

auglib.seed(0)

transform = auglib.transform.Mix(
    auglib.transform.WhiteNoiseGaussian(),
    snr_db=15,
    transform=auglib.transform.BandPass(
        center=4000,
        bandwidth=1000,
    ),
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)

Babble Noise¶

Babble noise refers to having several speakers in the background all talking at the same time. The easiest way to augment your signal with babble noise is to use another speech database.

In the next example, we use speech from musan and augment our signal with it similar to Section 3.3 in Snyder et al. 2018. We only load 10 speech files from musan to speed the example up. We recommend to use all media files, when using the augmentation in a real application.

auglib.seed(1)

db = audb.load(
    "musan",
    tables="speech",
    media=".*speech-librivox-000\d",
    version="1.0.0",
    verbose=False,
)

transform = auglib.transform.BabbleNoise(
    list(db.files),
    num_speakers=auglib.observe.IntUni(3, 7),
    snr_db=auglib.observe.IntUni(13, 20),
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)

Telephone¶

Telephone transmission is mainly characterised by the applied transmission codec, compare Vu et al. 2019. With auglib we can use the Adaptive Multi-Rate audio codec in its narrow band version (AMR-NB). Here, we select from three different codec bitrates, and add the possibility of clipping at the beginning, and the possibility of additive noise at the end of the processing. The AMR-NB codec requires a sampling rate of 8000 Hz, which auglib.Augment can take care of.

auglib.seed(0)

transform = auglib.transform.Compose(
    [
        auglib.transform.ClipByRatio(
            auglib.observe.FloatUni(0, 0.01),
            normalize=True,
        ),
        auglib.transform.AMRNB(
            auglib.observe.List([4750, 5900, 7400]),
        ),
        auglib.transform.WhiteNoiseGaussian(
            gain_db=auglib.observe.FloatUni(-35, -30),
            bypass_prob=0.7,
        ),
    ]
)
augment = auglib.Augment(
    transform,
    sampling_rate=8000,
    resample=True,
)
signal_augmented = augment(signal, sampling_rate)

Random Crop¶

To target machine learning models with a fixed signal input length, random cropping of the signals is often used. The following example uses auglib.transform.Trim to randomly crop the input to a length of 0.5 s. If you are training with torch and you want to apply the transform during every epoch you might consider audtorch.transforms.RandomCrop instead.

auglib.seed(0)

transform = auglib.transform.Trim(
    start_pos=auglib.Time(auglib.observe.FloatUni(0, 1), unit="relative"),
    duration=0.5,
    fill="loop",
    unit="seconds",
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)

Gated Noise¶

You might want to add temporarily changing background noise to your signal. The direct approach is to simply switch the noise on and off and generate gated background noise. In the example, we select a single noise file from the noise table of musan, which includes 930 different files. In a real application you should augment with all of them. A combination of auglib.transform.Mask and auglib.transform.Mix reads the noise starting from a random position, and adds it every 0.5 s to the target signal.

auglib.seed(0)

db = audb.load(
    "musan",
    tables="noise",
    media="noise/free-sound/noise-free-sound-0003.wav",
    version="1.0.0",
    verbose=False,
)

transform = auglib.transform.Mask(
    auglib.transform.Mix(
        auglib.observe.List(db.files, draw=True),
        gain_aux_db=auglib.observe.IntUni(-15, 0),
        read_pos_aux=auglib.observe.FloatUni(0, 1),
        unit="relative",
        loop_aux=True,
    ),
    step=0.5,
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)

Pitch Shift¶

You might want to change the pitch of a speaker or singer in your signal. We use praat here with the help of the parselmouth Python package. To install it you have to use the name praat-parselmouth. Internally, it extracts the pitch contour, changes the pitch, and re-synthesises the audio signal.

import parselmouth
from parselmouth.praat import call as praat

auglib.seed(2)

def pitch_shift(signal, sampling_rate, semitones):
    sound = parselmouth.Sound(signal, sampling_rate)
    manipulation = praat(sound, "To Manipulation", 0.01, 75, 600)
    pitch_tier = praat(manipulation, "Extract pitch tier")
    factor = 2 ** (semitones / 12)
    praat(pitch_tier, "Multiply frequencies", sound.xmin, sound.xmax, factor)
    praat([pitch_tier, manipulation], "Replace pitch tier")
    sound_transposed = praat(manipulation, "Get resynthesis (overlap-add)")
    return sound_transposed.values.flatten()

transform = auglib.transform.Function(
    function=pitch_shift,
    function_args={"semitones": auglib.observe.IntUni(-4, 4)},
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)

Constant Pitch¶

You might want to equalize the pitch of the speakers in your database. We use again praat to achieve this as mentioned in Pitch Shift.

The first approach calculates the average pitch of the input signal, and adjusts it to the desired pitch given as f0 in Hz by re-synthesizing the signal with a shifted pitch contour, which preserves the natural pitch fluctuations per speaker.

import numpy as np
import parselmouth
from parselmouth.praat import call as praat

def constant_pitch(signal, sampling_rate, desired_pitch):
    sound = parselmouth.Sound(signal, sampling_rate)
    # Estimate average pitch of signal
    pitch = sound.to_pitch()
    pitch = pitch.selected_array["frequency"]
    pitch[pitch == 0] = np.NaN
    pitch = np.nanmean(pitch)
    # Adjust signal to desired pitch
    manipulation = praat(sound, "To Manipulation", 0.01, 75, 600)
    pitch_tier = praat(manipulation, "Extract pitch tier")
    factor = desired_pitch / pitch
    praat(pitch_tier, "Multiply frequencies", sound.xmin, sound.xmax, factor)
    praat([pitch_tier, manipulation], "Replace pitch tier")
    sound_transposed = praat(manipulation, "Get resynthesis (overlap-add)")
    return sound_transposed.values.flatten()

transform = auglib.transform.Function(
    function=constant_pitch,
    function_args={"desired_pitch": 100},
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)

The second approach specifies a constant pitch contour representing the desired pitch, which removes any pitch fluctuations from the signal after re-synthesis.

import parselmouth
from parselmouth.praat import call as praat

def constant_pitch(signal, sampling_rate, desired_pitch):
    sound = parselmouth.Sound(signal, sampling_rate)
    manipulation = praat(sound, "To Manipulation", 0.01, 75, 600)
    pitch_tier = praat(manipulation, "Create PitchTier", "Name", sound.xmin, sound.xmax)
    praat(pitch_tier, "Add point", sound.xmax / 2, desired_pitch)
    praat([pitch_tier, manipulation], "Replace pitch tier")
    sound_transposed = praat(manipulation, "Get resynthesis (overlap-add)")
    return sound_transposed.values.flatten()

transform = auglib.transform.Function(
    function=constant_pitch,
    function_args={"desired_pitch": 100},
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)