Examples¶
Augmentation examples, showing you how to solve certain augmentation tasks.
Let’s start with loading an example file to augment.
import audb
import audiofile
files = audb.load_media(
"emodb",
"wav/03a01Fa.wav",
version="1.4.1",
verbose=False,
)
signal, sampling_rate = audiofile.read(files[0])
Recorded Reverb¶
Recorded reverb can be used
to make machine learning models robust
against changes of the room.
In the following we
use binaural impulse responses
recorded with a dummy head
from the air dataset.
Its rir
table holds recordings
for four different rooms
at different distances.
df = audb.load_table("air", "rir", version="1.4.2", verbose=False)
set(df.room)
{'booth', 'lecture', 'meeting', 'office'}
We load the left channel
of all impulse responses
stored in the rir
table
and resample them to 16000 Hz.
We then randomly pick
an impulse response
during augmentation
with auglib.observe.List
.
auglib.seed(0)
db = audb.load(
"air",
version="1.4.2",
tables="rir",
channels=[0],
sampling_rate=16000,
verbose=False,
)
transform = auglib.transform.Compose(
[
auglib.transform.FFTConvolve(
auglib.observe.List(db.files, draw=True),
keep_tail=False,
),
auglib.transform.NormalizeByPeak(),
]
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)
Artificial Reverb¶
If you don’t have enough examples of recorded reverb, or want to tune one particular parameter of reverb, you can artificially generate it. Pedalboard provides you a reverb transform, that let you adjust a bunch of parameters in the range 0 to 1. For more information on Pedalboard see the Pedalboard section. In the following, we simply pick all parameters randomly from a normal distribution.
auglib.seed(1)
def reverb(
signal,
sampling_rate,
room_size,
damping,
wet_level,
dry_level,
width,
):
r"""Reverb augmentation using pedalboard."""
import pedalboard
board = pedalboard.Pedalboard(
[
pedalboard.Reverb(
room_size=room_size,
damping=damping,
wet_level=wet_level,
dry_level=dry_level,
width=width,
),
],
)
return board(signal, sampling_rate)
random_params = auglib.observe.FloatNorm(
mean=0.5,
std=0.5,
minimum=0,
maximum=1,
)
transform = auglib.transform.Compose(
[
auglib.transform.Function(
reverb,
function_args={
"room_size": random_params,
"damping": random_params,
"wet_level": random_params,
"dry_level": random_params,
"width": random_params,
},
),
auglib.transform.NormalizeByPeak(),
]
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)
Music¶
Music can be added
as a background signal
during training of a machine learning model.
We load a single music file from musan
in this example.
We recommend to use all media files
from the music
table,
when using the augmentation in a real application.
We randomly crop each music sample
with repetition,
attenuate it by -15 dB to -10 dB,
and add it to the original input signal.
auglib.seed(0)
db = audb.load(
"musan",
tables="music",
media="music/fma/music-fma-0097.wav",
version="1.0.0",
verbose=False,
)
transform = auglib.transform.Mix(
auglib.observe.List(db.files, draw=True),
gain_aux_db=auglib.observe.IntUni(-15, -10),
read_pos_aux=auglib.observe.FloatUni(0, 1),
unit="relative",
loop_aux=True,
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)
Noise with fixed SNR¶
When adding noise to a signal during augmentation, it is often desired to let the noise level depend on the signal level to achieve a fixed signal-to-noise (SNR) between the two.
This can be achieved in auglib
by the snr_db
argument.
The following example
adds pink noise
with a SNR of 10 dB
to the input signal.
auglib.seed(0)
transform = auglib.transform.PinkNoise(snr_db=10)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)
Band-Pass Filtered Noise¶
Some augmentations
like auglib.transform.WhiteNoiseGaussian
generate augmentation signals
that are added to the incoming signal.
Those generated augmentation signals
can be modified
with the help
of auglib.transform.Mix
and its transform
argument.
The following example adds band-pass filtered white noise to the input signal.
auglib.seed(0)
transform = auglib.transform.Mix(
auglib.transform.WhiteNoiseGaussian(),
snr_db=15,
transform=auglib.transform.BandPass(
center=4000,
bandwidth=1000,
),
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)
Babble Noise¶
Babble noise refers to having several speakers in the background all talking at the same time. The easiest way to augment your signal with babble noise is to use another speech database.
In the next example, we use speech from musan and augment our signal with it similar to Section 3.3 in Snyder et al. 2018. We only load 10 speech files from musan to speed the example up. We recommend to use all media files, when using the augmentation in a real application.
auglib.seed(1)
db = audb.load(
"musan",
tables="speech",
media=".*speech-librivox-000\d",
version="1.0.0",
verbose=False,
)
transform = auglib.transform.BabbleNoise(
list(db.files),
num_speakers=auglib.observe.IntUni(3, 7),
snr_db=auglib.observe.IntUni(13, 20),
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)
Telephone¶
Telephone transmission is mainly characterised
by the applied transmission codec,
compare Vu et al. 2019.
With auglib
we can use
the Adaptive Multi-Rate audio codec
in its narrow band version (AMR-NB).
Here,
we select from three different codec bitrates,
and add the possibility of clipping
at the beginning,
and the possibility of additive noise
at the end of the processing.
The AMR-NB codec requires a sampling rate of 8000 Hz,
which auglib.Augment
can take care of.
auglib.seed(0)
transform = auglib.transform.Compose(
[
auglib.transform.ClipByRatio(
auglib.observe.FloatUni(0, 0.01),
normalize=True,
),
auglib.transform.AMRNB(
auglib.observe.List([4750, 5900, 7400]),
),
auglib.transform.WhiteNoiseGaussian(
gain_db=auglib.observe.FloatUni(-35, -30),
bypass_prob=0.7,
),
]
)
augment = auglib.Augment(
transform,
sampling_rate=8000,
resample=True,
)
signal_augmented = augment(signal, sampling_rate)
Random Crop¶
To target machine learning models
with a fixed signal input length,
random cropping of the signals
is often used.
The following example
uses auglib.transform.Trim
to randomly crop the input to a length of 0.5 s.
If you are training with torch
and you want to apply the transform
during every epoch
you might consider
audtorch.transforms.RandomCrop
instead.
auglib.seed(0)
transform = auglib.transform.Trim(
start_pos=auglib.Time(auglib.observe.FloatUni(0, 1), unit="relative"),
duration=0.5,
fill="loop",
unit="seconds",
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)
Gated Noise¶
You might want to add temporarily changing background noise
to your signal.
The direct approach
is to simply switch the noise on and off
and generate gated background noise.
In the example,
we select a single noise file
from the noise
table of musan,
which includes 930 different files.
In a real application
you should augment with all of them.
A combination
of auglib.transform.Mask
and auglib.transform.Mix
reads the noise
starting from a random position,
and adds it every 0.5 s
to the target signal.
auglib.seed(0)
db = audb.load(
"musan",
tables="noise",
media="noise/free-sound/noise-free-sound-0003.wav",
version="1.0.0",
verbose=False,
)
transform = auglib.transform.Mask(
auglib.transform.Mix(
auglib.observe.List(db.files, draw=True),
gain_aux_db=auglib.observe.IntUni(-15, 0),
read_pos_aux=auglib.observe.FloatUni(0, 1),
unit="relative",
loop_aux=True,
),
step=0.5,
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)
Pitch Shift¶
You might want to change the pitch
of a speaker or singer
in your signal.
We use praat here
with the help of the parselmouth
Python package.
To install it
you have to use the name praat-parselmouth
.
Internally,
it extracts the pitch contour,
changes the pitch,
and re-synthesises the audio signal.
import parselmouth
from parselmouth.praat import call as praat
auglib.seed(2)
def pitch_shift(signal, sampling_rate, semitones):
sound = parselmouth.Sound(signal, sampling_rate)
manipulation = praat(sound, "To Manipulation", 0.01, 75, 600)
pitch_tier = praat(manipulation, "Extract pitch tier")
factor = 2 ** (semitones / 12)
praat(pitch_tier, "Multiply frequencies", sound.xmin, sound.xmax, factor)
praat([pitch_tier, manipulation], "Replace pitch tier")
sound_transposed = praat(manipulation, "Get resynthesis (overlap-add)")
return sound_transposed.values.flatten()
transform = auglib.transform.Function(
function=pitch_shift,
function_args={"semitones": auglib.observe.IntUni(-4, 4)},
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)
Constant Pitch¶
You might want to equalize the pitch of the speakers in your database. We use again praat to achieve this as mentioned in Pitch Shift.
The first approach calculates the average pitch of the input signal, and adjusts it to the desired pitch given as f0 in Hz by re-synthesizing the signal with a shifted pitch contour, which preserves the natural pitch fluctuations per speaker.
import numpy as np
import parselmouth
from parselmouth.praat import call as praat
def constant_pitch(signal, sampling_rate, desired_pitch):
sound = parselmouth.Sound(signal, sampling_rate)
# Estimate average pitch of signal
pitch = sound.to_pitch()
pitch = pitch.selected_array["frequency"]
pitch[pitch == 0] = np.NaN
pitch = np.nanmean(pitch)
# Adjust signal to desired pitch
manipulation = praat(sound, "To Manipulation", 0.01, 75, 600)
pitch_tier = praat(manipulation, "Extract pitch tier")
factor = desired_pitch / pitch
praat(pitch_tier, "Multiply frequencies", sound.xmin, sound.xmax, factor)
praat([pitch_tier, manipulation], "Replace pitch tier")
sound_transposed = praat(manipulation, "Get resynthesis (overlap-add)")
return sound_transposed.values.flatten()
transform = auglib.transform.Function(
function=constant_pitch,
function_args={"desired_pitch": 100},
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)
The second approach specifies a constant pitch contour representing the desired pitch, which removes any pitch fluctuations from the signal after re-synthesis.
import parselmouth
from parselmouth.praat import call as praat
def constant_pitch(signal, sampling_rate, desired_pitch):
sound = parselmouth.Sound(signal, sampling_rate)
manipulation = praat(sound, "To Manipulation", 0.01, 75, 600)
pitch_tier = praat(manipulation, "Create PitchTier", "Name", sound.xmin, sound.xmax)
praat(pitch_tier, "Add point", sound.xmax / 2, desired_pitch)
praat([pitch_tier, manipulation], "Replace pitch tier")
sound_transposed = praat(manipulation, "Get resynthesis (overlap-add)")
return sound_transposed.values.flatten()
transform = auglib.transform.Function(
function=constant_pitch,
function_args={"desired_pitch": 100},
)
augment = auglib.Augment(transform)
signal_augmented = augment(signal, sampling_rate)