Usage

auglib lets you augment audio data. It provides transformations in its sub-module auglib.transform. auglib.Augment can then apply those transformations to a signal, a file, or a whole dataset.

Check how to include external solutions and look at the examples for further inspiration. Or continue reading here, to see how to apply the augmentations to different inputs.

import auglib

transform = auglib.transform.Compose(
    [
        auglib.transform.HighPass(cutoff=5000),
        auglib.transform.Clip(),
        auglib.transform.NormalizeByPeak(peak_db=-3),
    ]
)
augment = auglib.Augment(transform)

Augment a signal

We now load a signal from emodb, and apply our augmentation to it.

import audb
import audiofile

files = audb.load_media(
    "emodb",
    "wav/03a01Fa.wav",
    version="1.4.1",
    verbose=False,
)
signal, sampling_rate = audiofile.read(files[0])
signal_augmented = augment(signal, sampling_rate)
_images/usage_3_0.png

_images/usage_5_0.png

Augment files in memory

auglib.Augment can apply the augmentation to a list of files. We load three files from emodb, and augment them using auglib.Augment.process_files().

files = audb.load_media(
    "emodb",
    ["wav/03a01Fa.wav", "wav/03a01Nc.wav", "wav/03a01Wa.wav"],
    version="1.4.1",
    verbose=False,
)
y_augmented = augment.process_files(files)
y_augmented
file start end
/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Fa.wav 0 days 0 days 00:00:01.898250 [[0.0012283714, 0.0041666315, -0.0018338256, -...
/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Nc.wav 0 days 0 days 00:00:01.611250 [[0.0003378813, -0.00029246297, 0.00043359815,...
/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Wa.wav 0 days 0 days 00:00:01.877812500 [[0.0, 2.8740513e-05, -0.00017815993, 0.000150...

All process_*() methods return a series (pd.Series) holding the augmented signals with a pd.MultiIndex containing the levels file, start, end (segmented index).

Augment a dataset to disk

auglib.Augment.augment() augments a dataset to a cache folder on disk. It takes as input an index, series or dataframe. The index needs at least one level, named file holding file paths (filewise index) or the three levels file, start, and end, holding information on start and end for segments of the file provided as pd.Timedelta (segmented index). auglib.Augment.augment() returns an index, series, or table with a segmented index that points to the augmented files.

The next example loads the emodb dataset, limited to three files for this example. It then uses audformat.Database.files to get a filewise index pointing to the files of the dataset.

db = audb.load(
    "emodb",
    version="1.4.1",
    media=["wav/03a01Fa.wav", "wav/03a01Nc.wav", "wav/03a01Wa.wav"],
    verbose=False,
)
index = db.files
index_augmented = augment.augment(index, cache_root="cache")
index_augmented
file start end
0 /home/runner/work/auglib/auglib/cache/bd491f1a... 0 days 0 days 00:00:01.898250
1 /home/runner/work/auglib/auglib/cache/bd491f1a... 0 days 0 days 00:00:01.611250
2 /home/runner/work/auglib/auglib/cache/bd491f1a... 0 days 0 days 00:00:01.877812500

The augmented files are stored inside the cache_root folder. If auglib.Augment.augment() is called again on the same index, it detects the requested augmentation in cache, and returns directly its result. If you don’t specify cache_root, the default value of $HOME/auglib will be used.

If we pass a series instead of an index a series will be returned:

y = db["files"]["speaker"].get()
y_augmented = augment.augment(y, cache_root="cache")
y_augmented
file start end
/home/runner/work/auglib/auglib/cache/bd491f1a/8464945703565270270/0/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Fa.wav 0 days 0 days 00:00:01.898250 3
/home/runner/work/auglib/auglib/cache/bd491f1a/8464945703565270270/0/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Nc.wav 0 days 0 days 00:00:01.611250 3
/home/runner/work/auglib/auglib/cache/bd491f1a/8464945703565270270/0/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Wa.wav 0 days 0 days 00:00:01.877812500 3

Finally, we augment a dataframe, this time keeping the original files in the result and augmenting every file twice.

df = db["files"].get()
df_augmented = augment.augment(
    df,
    cache_root="cache",
    modified_only=False,
    num_variants=2,
)
df_augmented
duration speaker transcription
file start end
/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Fa.wav 0 days 0 days 00:00:01.898250 0 days 00:00:01.898250 3 a01
/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Nc.wav 0 days 0 days 00:00:01.611250 0 days 00:00:01.611250 3 a01
/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Wa.wav 0 days 0 days 00:00:01.877812500 0 days 00:00:01.877812500 3 a01
/home/runner/work/auglib/auglib/cache/bd491f1a/8464945703565270270/0/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Fa.wav 0 days 0 days 00:00:01.898250 0 days 00:00:01.898250 3 a01
/home/runner/work/auglib/auglib/cache/bd491f1a/8464945703565270270/0/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Nc.wav 0 days 0 days 00:00:01.611250 0 days 00:00:01.611250 3 a01
/home/runner/work/auglib/auglib/cache/bd491f1a/8464945703565270270/0/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Wa.wav 0 days 0 days 00:00:01.877812500 0 days 00:00:01.877812500 3 a01
/home/runner/work/auglib/auglib/cache/bd491f1a/8464945703565270270/1/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Fa.wav 0 days 0 days 00:00:01.898250 0 days 00:00:01.898250 3 a01
/home/runner/work/auglib/auglib/cache/bd491f1a/8464945703565270270/1/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Nc.wav 0 days 0 days 00:00:01.611250 0 days 00:00:01.611250 3 a01
/home/runner/work/auglib/auglib/cache/bd491f1a/8464945703565270270/1/home/runner/audb/emodb/1.4.1/d3b62a9b/wav/03a01Wa.wav 0 days 0 days 00:00:01.877812500 0 days 00:00:01.877812500 3 a01

Serialize

It’s possible to serialize a auglib.Augment object to YAML.

print(augment.to_yaml_s())
$auglib.core.interface.Augment==1.0.4:
  transform:
    $auglib.core.transform.Compose==1.0.4:
      transforms:
      - $auglib.core.transform.HighPass==1.0.4:
          cutoff: 5000
          order: 1
          design: butter
          preserve_level: false
          bypass_prob: null
      - $auglib.core.transform.Clip==1.0.4:
          threshold: 0.0
          soft: false
          normalize: false
          preserve_level: false
          bypass_prob: null
      - $auglib.core.transform.NormalizeByPeak==1.0.4:
          peak_db: -3
          clip: false
          preserve_level: false
          bypass_prob: null
      preserve_level: false
      bypass_prob: null
  sampling_rate: null
  resample: false
  channels: null
  mixdown: false
  seed: null

We can save it to a file and re-instantiate it from there.

import audobject

file = "transform.yaml"
augment.to_yaml(file)
augment_from_yaml = audobject.from_yaml(file)
augment_from_yaml(signal, sampling_rate)
array([[ 1.2283714e-03,  4.1666315e-03, -1.8338256e-03, ...,
        -4.1018693e-06,  8.3834183e-04, -1.6675657e-04]], dtype=float32)

The new object creates the exact same augmentation. To make an augmentation reproducible that includes random behavior we have to set the seed argument.

transform = auglib.transform.PinkNoise(gain_db=-5)
augment = auglib.Augment(transform, seed=0)
augment(signal, sampling_rate)
array([[0.2510293 , 0.19091995, 0.23076648, ..., 0.12397236, 0.15249531,
        0.16985646]], dtype=float32)

When we serialize the object, the seed will be stored to YAML and used to re-initialize the random number generator when the object is loaded.

augment.to_yaml(file)
augment_from_yaml = audobject.from_yaml(file)
augment_from_yaml(signal, sampling_rate)
array([[0.2510293 , 0.19091995, 0.23076648, ..., 0.12397236, 0.15249531,
        0.16985646]], dtype=float32)

If we wanted a different random seed we can also overwrite the value.

augment_other_seed = audobject.from_yaml(file, override_args={"seed": 1})
augment_other_seed(signal, sampling_rate)
array([[-0.01383871, -0.10363714, -0.12082221, ..., -0.21219613,
        -0.08782648, -0.14412443]], dtype=float32)