audtorch.transforms¶
The transforms can be provided to audtorch.datasets as an argument
and work on the data before it will be returned.
Note
All of the transforms work currently only with numpy.array as
inputs, not torch.Tensor.
Compose¶
-
class
audtorch.transforms.Compose(transforms, *, fix_randomization=False)¶ Compose several transforms together.
- Parameters
transforms (list of object) – list of transforms to compose
fix_randomization (bool, optional) – controls randomization of underlying transforms. Default: False
Example
>>> a = np.array([[1, 2], [3, 4]]) >>> t = Compose([Crop(-1), Pad(1)]) >>> print(t) Compose( Crop(idx=-1, axis=-1) Pad(padding=1, value=0, axis=-1) ) >>> t(a) array([[0, 2, 0], [0, 4, 0]])
Crop¶
-
class
audtorch.transforms.Crop(idx, *, axis=- 1)¶ Crop along an axis.
idxcontrols the index for croppingaxiscontrols axis of cropping
- Parameters
Note
Indexing from the end with -1, -2, … is allowed. But you cannot use -1 in the second part of the tuple to specify the last entry. Instead you have to write (-2, signal.shape[axis]) to get the last two entries of axis, or simply -1 if you only want to get the last entry.
- Shape:
Input: \((*, N_\text{in}, *)\)
Output: \((*, N_\text{out}, *)\), where \(N_\text{in}\) is the input length of the axis to crop and \(N_\text{out}\) is the output length, which is \(1\) for an integer as idx and \(\text{idx[1]} - \text{idx[0]}\) for a tuple with positive entries as idx. \(*\) can be any additional number of dimensions.
Example
>>> a = np.array([[1, 2], [3, 4]]) >>> t = Crop(1, axis=1) >>> print(t) Crop(idx=1, axis=1) >>> t(a) array([[2], [4]])
RandomCrop¶
-
class
audtorch.transforms.RandomCrop(size, *, method='pad', axis=- 1, fix_randomization=False)¶ Random crop of specified width along an axis.
If the signal is too short it is padded by trailing zeros first or replicated to fit specified size.
If the signal is shorter than the desired length, it can be expanded by one of these methods:
'pad'expand the signal by adding trailing zeros'replicate'first replicate the signal so that it matches or exceeds the specified size
sizecontrols the size of output signalmethodholds expansion methodaxiscontrols axis of croppingfix_randomizationcontrols the randomness
- Parameters
- Shape:
Input: \((*, N_\text{in}, *)\)
Output: \((*, N_\text{out}, *)\), where \(N_\text{in}\) is the input length of the axis to crop and \(N_\text{out}\) is the output length as given by size. \(*\) can be any additional number of dimensions.
Example
>>> random.seed(0) >>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]) >>> t = RandomCrop(2) >>> print(t) RandomCrop(size=2, method=pad, axis=-1) >>> t(a) array([[2, 3], [6, 7]])
Pad¶
-
class
audtorch.transforms.Pad(padding, *, value=0, axis=- 1)¶ Pad along an axis.
If padding is an integer it pads equally on the left and right of the signal. If padding is a tuple with two entries it uses the first for the left side and the second for the right side.
paddingcontrols the padding to be appliedvaluecontrols the value used for paddingaxiscontrols the axis of padding
- Parameters
- Shape:
Input: \((*, N_\text{in}, *)\)
Output: \((*, N_\text{out}, *)\), where \(N_\text{in}\) is the input length of the axis to pad and \(N_\text{out} = N_\text{in} + \sum \text{padding}\) is the output length. \(*\) can be any additional number of dimensions.
Example
>>> a = np.array([[1, 2], [3, 4]]) >>> t = Pad((0, 1)) >>> print(t) Pad(padding=(0, 1), value=0, axis=-1) >>> t(a) array([[1, 2, 0], [3, 4, 0]])
RandomPad¶
-
class
audtorch.transforms.RandomPad(padding, *, value=0, axis=- 1, fix_randomization=False)¶ Random pad along an axis.
It splits the padding value randomly between the left and right of the signal along the specified axis.
paddingcontrols the size of padding to be appliedvaluecontrols the value used for paddingaxiscontrols the axis of paddingfix_randomizationcontrols the randomness
- Parameters
- Shape:
Input: \((*, N_\text{in}, *)\)
Output: \((*, N_\text{out}, *)\), where \(N_\text{in}\) is the input length of the axis to pad and \(N_\text{out} = N_\text{in} + \sum \text{padding}\) is the output length. \(*\) can be any additional number of dimensions.
Example
>>> random.seed(0) >>> a = np.array([[1, 2], [3, 4]]) >>> t = RandomPad(1) >>> print(t) RandomPad(padding=1, value=0, axis=-1) >>> t(a) array([[0, 1, 2], [0, 3, 4]])
Replicate¶
-
class
audtorch.transforms.Replicate(repetitions, *, axis=- 1)¶ Replicate along an axis.
repetitionscontrols number of signal replicationsaxiscontrols the axis of replication
- Parameters
- Shape:
Input: \((*, N_\text{in}, *)\)
Output: \((*, N_\text{out}, *)\), where \(N_\text{in}\) is the input length of the axis to replicate and \(N_\text{out} = N_\text{in} \cdot \text{repetitions}\) is the output length. \(*\) can be any additional number of dimensions.
Example
>>> a = np.array([[1, 2, 3]]) >>> t = Replicate(3) >>> print(t) Replicate(repetitions=3, axis=-1) >>> t(a) array([[1, 2, 3, 1, 2, 3, 1, 2, 3]])
RandomReplicate¶
-
class
audtorch.transforms.RandomReplicate(*, max_repetitions=100, axis=- 1, fix_randomization=False)¶ Replicate by a random number of times along an axis.
repetitionsholds number of times to replicate signalaxiscontrols the axis of replicationfix_randomizationcontrols the randomness
- Parameters
- Shape:
Input: \((*, N_\text{in}, *)\)
Output: \((*, N_\text{out}, *)\), where \(N_\text{in}\) is the input length of the axis to pad and \(N_\text{out} = N_\text{in} \cdot \text{repetitions}\) is the output length. \(*\) can be any additional number of dimensions.
Example
>>> random.seed(0) >>> a = np.array([1, 2, 3]) >>> t = RandomReplicate(max_repetitions=3) >>> print(t) RandomReplicate(max_repetitions=3, repetitions=None, axis=-1) >>> t(a) array([1, 2, 3, 1, 2, 3, 1, 2, 3])
Expand¶
-
class
audtorch.transforms.Expand(size, *, method='pad', axis=- 1)¶ Expand signal.
Ensures that the signal matches the desired output size by padding or replicating it.
sizecontrols the size of output signalmethodcontrols whether to replicate signal or pad itaxiscontrols axis of expansion
The expansion is done by one of these methods:
'pad'expand the signal by adding trailing zeros'replicate'replicate the signal to match the specified size. If result exceeds specified size after replication, the signal will then be cropped
- Parameters
- Shape:
Input: \((*, N_\text{in}, *)\)
Output: \((*, N_\text{out}, *)\), where \(N_\text{in}\) is the input length of the axis to expand and \(N_\text{out}\) is the output length as given by size. \(*\) can be any additional number of dimensions.
Example
>>> a = np.array([[1, 2, 3]]) >>> t = Expand(6) >>> print(t) Expand(size=6, method=pad, axis=-1) >>> t(a) array([[1, 2, 3, 0, 0, 0]])
RandomMask¶
-
class
audtorch.transforms.RandomMask(coverage, max_width, value, axis)¶ Randomly masks signal along axis.
The signal is masked by multiple blocks (i.e. consecutive units) size of which is uniformly sampled given an upper limit on the block size. The algorithm for a single block is as follows:
\(\text{width} ~ U[0, {\text{maximum\_width}}]\)
\(\text{start} ~ U[0, {\text{signal\_size}} - \text{width})\)
The number of blocks is approximated by the specified coverage of the masking and the average size of a block.
coveragecontrols how large the proportion of masking is relative to the signal sizemax_widthcontrols the maximum size of a masked blockvaluecontrols the value to mask the signal withaxiscontrols the axis to mask the signal along
- Parameters
Example
>>> a = torch.empty((1, 4, 10)).uniform_(1, 2) >>> t = RandomMask(0.1, max_width=1, value=0, axis=2) >>> print(t) RandomMask(coverage=0.1, max_width=1, value=0, axis=2) >>> len((t(a) == 0).nonzero()) # number of 0 elements 4
MaskSpectrogramTime¶
-
class
audtorch.transforms.MaskSpectrogramTime(coverage, *, max_width=11, value=0)¶ Randomly masks spectrogram along time axis.
See
RandomMaskfor more details.Note
The time axis is derived from Spectrogram’s output shape.
- Parameters
Example
>>> from librosa.display import specshow >>> import matplotlib.pyplot as plt >>> a = torch.empty(65000).uniform_(-1, 1) >>> t = Compose([Spectrogram(320, 160), MaskSpectrogramTime(0.1)]) >>> magnitude = t(a).squeeze().numpy() >>> specshow(np.log10(np.abs(magnitude) + 1e-4)) >>> plt.show()
MaskSpectrogramFrequency¶
-
class
audtorch.transforms.MaskSpectrogramFrequency(coverage, *, max_width=8, value=0)¶ Randomly masks spectrogram along frequency axis.
See
RandomMaskfor more details.Note
The frequency axis is derived from Spectrogram’s output shape.
- Parameters
Example
>>> from librosa.display import specshow >>> import matplotlib.pyplot as plt >>> a = torch.empty(65000).uniform_(-1, 1) >>> t = Compose([Spectrogram(320, 160), MaskSpectrogramFrequency(0.1)]) >>> magnitude = t(a).squeeze().numpy() >>> specshow(np.log10(np.abs(magnitude) + 1e-4)) >>> plt.show()
Downmix¶
-
class
audtorch.transforms.Downmix(channels, *, method='mean', axis=- 2)¶ Downmix to the provided number of channels.
The downmix is done by one of these methods:
'mean'replace last desired channel by mean across itself and all remaining channels'crop'drop all remaining channels
channelscontrols the number of desired channelsmethodcontrols downmixing methodaxiscontrols axis of downmix
- Parameters
- Shape:
Input: \((*, C_\text{in}, *)\)
Output: \((*, C_\text{out}, *)\), where \(C_\text{in}\) is the number of input channels and \(C_\text{out}\) is the number of output channels as given by channels. \(*\) can be any additional number of dimensions.
Example
>>> a = np.array([[1, 2], [3, 4]]) >>> t = Downmix(1, axis=0) >>> print(t) Downmix(channels=1, method=mean, axis=0) >>> t(a) array([[2, 3]])
Upmix¶
-
class
audtorch.transforms.Upmix(channels, *, method='mean', axis=- 2)¶ Upmix to the provided number of channels.
The upmix is achieved by adding the same signal in the additional channels. This signal is calculated by one of the following methods:
'mean'mean across all input channels'zero'zeros'repeat'last input channel
channelscontrols the number of desired channelsmethodcontrols downmixing methodaxiscontrols axis of upmix
- Parameters
- Shape:
Input: \((*, C_\text{in}, *)\)
Output: \((*, C_\text{out}, *)\), where \(C_\text{in}\) is the number of input channels and \(C_\text{out}\) is the number of output channels as given by channels. \(*\) can be any additional number of dimensions.
Example
>>> a = np.array([[1, 2], [3, 4]]) >>> t = Upmix(3, axis=0) >>> print(t) Upmix(channels=3, method=mean, axis=0) >>> t(a) array([[1., 2.], [3., 4.], [2., 3.]])
Remix¶
-
class
audtorch.transforms.Remix(channels, *, method='mean', axis=- 2)¶ Remix to the provided number of channels.
The remix is achieved by repeating the mean of all other channels or by replacing the last desired channel by the mean across all channels.
It is internally achieved by running
UpmixorDownmixwith method mean.channelscontrols the number of desired channelsaxiscontrols axis of upmix
- Parameters
- Shape:
Input: \((*, C_\text{in}, *)\)
Output: \((*, C_\text{out}, *)\), where \(C_\text{in}\) is the number of input channels and \(C_\text{out}\) is the number of output channels as given by channels. \(*\) can be any additional number of dimensions.
Example
>>> a = np.array([[1, 2], [3, 4]]) >>> t = Remix(3, axis=0) >>> print(t) Remix(channels=3, axis=0) >>> t(a) array([[1., 2.], [3., 4.], [2., 3.]])
Normalize¶
-
class
audtorch.transforms.Normalize(*, axis=- 1)¶ Normalize signal.
Ensure the maximum of the absolute value of the signal is 1.
axiscontrols axis for normalization
- Parameters
axis (int, optional) – axis for normalization. Default: -1
- Shape:
Input: \((*)\)
Output: \((*)\), where \(*\) can be any number of dimensions.
Example
>>> a = np.array([1, 2, 3, 4]) >>> t = Normalize() >>> print(t) Normalize(axis=-1) >>> t(a) array([0.25, 0.5 , 0.75, 1. ])
Standardize¶
-
class
audtorch.transforms.Standardize(*, mean=True, std=True, axis=- 1)¶ Standardize signal.
Ensure the signal has a mean value of 0 and a variance of 1.
meancontrols whether mean centering will be appliedstdcontrols whether standard deviation normalization will be appliedaxiscontrols axis for standardization
- Parameters
- Shape:
Input: \((*)\)
Output: \((*)\), where \(*\) can be any number of dimensions.
Example
>>> a = np.array([1, 2, 3, 4]) >>> t = Standardize() >>> print(t) Standardize(axis=-1, mean=True, std=True) >>> t(a) array([-1.34164079, -0.4472136 , 0.4472136 , 1.34164079])
Resample¶
-
class
audtorch.transforms.Resample(input_sampling_rate, output_sampling_rate, *, method='kaiser_best', axis=- 1)¶ Resample to new sampling rate.
The signal is resampled by one of the following methods.
'kaiser_best'as implemented by resampy'kaiser_fast'as implemented by resampy'scipy'uses scipy for resampling
input_sampling_ratecontrols input sample rate in Hzoutput_sampling_ratecontrols output sample rate in Hzmethodcontrols the resample methodaxiscontrols axis for resampling
- Parameters
Note
If the default method kaiser_best is too slow for your purposes, you should try scipy instead. scipy is the fastest method, but might crash for very long signals.
- Shape:
Input: \((*)\)
Output: \((*)\), where \(*\) can be any number of dimensions.
Example
>>> a = np.array([1, 2, 3, 4]) >>> t = Resample(4, 2) >>> print(t) Resample(input_sampling_rate=4, output_sampling_rate=2, method=kaiser_best, axis=-1) >>> t(a) array([0, 2])
Spectrogram¶
-
class
audtorch.transforms.Spectrogram(window_size, hop_size, *, fft_size=None, window='hann', axis=- 1)¶ Spectrogram of an audio signal.
The spectrogram is calculated by librosa and its magnitude is returned as real valued matrix.
window_sizecontrols FFT window size in sampleshop_sizecontrols STFT window hop size in samplesfft_sizecontrols number of frequency bins in STFTwindowcontrols window function of spectrogram computationaxiscontrols axis of spectrogram computationphaseholds the phase of the spectrogram
- Parameters
window_size (int) – size of STFT window in samples
hop_size (int) – size of STFT window hop in samples
fft_size (int, optional) – number of frequency bins in STFT. If None, then it defaults to window_size. Default: None
window (str, tuple, number, function, or numpy.ndarray, optional) – type of STFT window. Default: hann
axis (int, optional) – axis of STFT calculation. Default: -1
- Shape:
Input: \((*, N_\text{in}, *)\)
Output: \((*, N_f, N_t, *)\), where \(N_\text{in}\) is the number of input samples and \(N_f = {\text{window\_size} \over 2} + 1\) is the number of output samples along the frequency axis of the spectrogram, and \(N_t = \lceil {1 \over \text{hop\_size}} (N_\text{in} + {\text{window\_size} \over 2}) \rceil\) is the number of output samples along the time axis of the spectrogram. \(*\) can be any additional number of dimensions.
Example
>>> a = np.array([1., 2., 3., 4.]) >>> t = Spectrogram(2, 2) >>> print(t) Spectrogram(window_size=2, hop_size=2, axis=-1) >>> t(a) array([[1., 3., 3.], [1., 3., 3.]])
Log¶
-
class
audtorch.transforms.Log(*, magnitude_boost=1e-07)¶ Logarithmic transform of an input signal.
magnitude_boostcontrols the non-negative value added to the magnitude of the signal before applying the logarithmus
- Parameters
magnitude_boost (float, optional) – positive value added to the magnitude of the signal before applying the logarithmus. Default: 1e-7
- Shape:
Input: \((*)\)
Output: \((*)\), where \(*\) can be any additional number of dimensions.
Example
>>> a = np.array([1., 2., 3., 4.]) >>> spect = Spectrogram(window_size=2, hop_size=2) >>> t = Log() >>> print(t) Log(magnitude_boost=1e-07) >>> np.set_printoptions(precision=5) >>> t(spect(a)) array([[1.00000e-07, 1.09861e+00, 1.09861e+00], [1.00000e-07, 1.09861e+00, 1.09861e+00]])
RandomAdditiveMix¶
-
class
audtorch.transforms.RandomAdditiveMix(dataset, *, ratios=[0, 15, 30], normalize=False, expand_method='pad', crop_method='random', percentage_silence=0, time_axis=- 1, channel_axis=- 2, fix_randomization=False)¶ Mix two signals additively by a randomly picked ratio.
Randomly pick a signal from an augmentation data set and mix it with the actual signal by a signal-to-noise ratio in dB randomly selected from a list of possible ratios.
The signal from the augmentation data set is expanded, cropped, or has its number of channels adjusted by a downmix or upmix using
Remixif necessary.The signal can be expanded by:
'multiple'loading multiple files from the augmentation data set and concatenating them along the time axis'pad'expand the signal by adding trailing zeros'replicate'replicate the signal to match the specified size. If result exceeds specified size after replication, the signal will then be cropped
The signal can be cropped by:
'start'crop signal from the beginning of the file all the way to the necessary length'random'starts at a random offset from the beginning of the file
datasetcontrols the data set used for augmentationratiocontrols the ratio in dB between mixed signalsratioscontrols the ratios to be randomly picked fromnormalizecontrols if the mixed signal is normalizedexpand_methodcontrols if the signal from the augmented data set is automatically expanded according to an expansion rule. Default: padcrop_methodcontrols how the signal is cropped. Is only relevant if the augmentation signal is longer than the input one, or if expand_method is set to multiple. Default: randompercentage_silencecontrols the percentage of the input data that will be mixed with silence. Should be between 0 and 1. Default: 1time_axiscontrols time axis for automatic signal adjustmentchannel_axiscontrols channel axis for automatic signal adjustmentfix_randomizationcontrols the randomness of the ratio selection
Note
fix_randomizationcovers only the selection of the ratio. The selection of a signal from the augmentation data set and its signal length adjustment will always be random.- Parameters
dataset (torch.utils.data.Dataset) – data set for augmentation
ratios (list of int, optional) – mix ratios in dB to randomly pick from (e.g. SNRs). Default: [0, 15, 30]
normalize (bool, optional) – normalize mixture. Default: False
expand_method (str, optional) – controls the adjustment of the length data set that is added to the original data set. Default: pad
crop_method (str, optional) – controls the crop transform that will be called on the mix signal if it is longer than the input signal. Default: random
percentage_silence (float, optional) – controls the percentage of input data that should be augmented with silence. Default: 0
time_axis (int, optional) – length axis of both data sets. Default: -1
channel_axis (int, optional) – channels axis of both data sets. Default: -2
fix_randomization (bool, optional) – freeze random selection between different calls of transform. Default: False
- Shape:
Input: \((*, C, N, *)\)
Output: \((*, C, N, *)\), where \(C\) is the number of channels and \(N\) is the number of samples. They don’t have to be placed in the order shown here, but the order is preserved during transformation. \(*\) can be any additional number of dimensions.
Example
>>> from audtorch import datasets >>> np.random.seed(0) >>> a = np.array([[1, 2], [3, 4]]) >>> noise = datasets.WhiteNoise(duration=1, sampling_rate=2) >>> t = RandomAdditiveMix(noise, ratios=[3], expand_method='pad') >>> print(t) RandomAdditiveMix(dataset=WhiteNoise, ratios=[3], ratio=None, percentage_silence=0, expand_method=pad, crop_method=random, time_axis=-1, channel_axis=-2) >>> np.set_printoptions(precision=8) >>> t(a) array([[3.67392992, 2.60655362], [5.67392992, 4.60655362]])
RandomConvolutionalMix¶
-
class
audtorch.transforms.RandomConvolutionalMix(dataset, *, normalize=False, axis=- 1)¶ Convolve the signal with an augmentation data set.
Randomly pick an impulse response from an augmentation data set and convolve it with the signal. The impulse responses have to be one-dimensional.
datasetcontrols the data set used for augmentationnormalizecontrols normalisation of convolved signalaxiscontrols axis of upmix
- Parameters
dataset (torch.utils.data.Dataset) – data set for augmentation
normalize (bool, optional) – normalize mixture. Default: False
axis (int, optional) – axis of convolution. Default: -1
- Shape:
Input: \((*, N, *)\)
Output: \((*, N + M - 1, *)\), where \(N\) is the number of samples of the signal and \(M\) the number of samples of the impulse response. \(*\) can be any additional number of dimensions.
Example
>>> from audtorch import datasets >>> np.random.seed(0) >>> a = np.array([[1, 2], [3, 4]]) >>> noise = datasets.WhiteNoise(duration=1, sampling_rate=2, transform=np.squeeze) >>> t = RandomConvolutionalMix(noise, normalize=True) >>> print(t) RandomConvolutionalMix(dataset=WhiteNoise, axis=-1, normalize=True) >>> np.set_printoptions(precision=8) >>> t(a) array([[0.21365151, 0.47576767, 0.09692931], [0.64095452, 1. , 0.19385863]])