audtorch.transforms.functional

The goal of the transform functionals is to provide functions that work independent on the dimensions of the input signal and can be used easily to create the actual transforms.

Note

All of the transforms work currently only with numpy.array as inputs, not torch.Tensor.

crop

audtorch.transforms.functional.crop(signal, idx, *, axis=- 1)

Crop signal along an axis.

Parameters
  • signal (numpy.ndarray) – audio signal

  • idx (int or tuple) – first (and last) index to return

  • axis (int, optional) – axis along to crop. Default: -1

Note

Indexing from the end with -1, -2, … is allowed. But you cannot use -1 in the second part of the tuple to specify the last entry. Instead you have to write (-2, signal.shape[axis]) to get the last two entries of axis, or simply -1 if you only want to get the last entry.

Returns

cropped signal

Return type

numpy.ndarray

Example

>>> a = np.array([[1, 2], [3, 4]])
>>> crop(a, 1)
array([[2],
       [4]])

pad

audtorch.transforms.functional.pad(signal, padding, *, value=0, axis=- 1)

Pad signal along an axis.

If padding is an integer it pads equally on the left and right of the signal. If padding is a tuple with two entries it uses the first for the left side and the second for the right side.

Parameters
  • signal (numpy.ndarray) – audio signal

  • padding (int or tuple) – padding to apply on the left and right

  • value (float, optional) – value to pad with. Default: 0

  • axis (int, optional) – axis along which to pad. Default: -1

Returns

padded signal

Return type

numpy.ndarray

Example

>>> a = np.array([[1, 2], [3, 4]])
>>> pad(a, (0, 1))
array([[1, 2, 0],
       [3, 4, 0]])

replicate

audtorch.transforms.functional.replicate(signal, repetitions, *, axis=- 1)

Replicate signal along an axis.

Parameters
  • signal (numpy.ndarray) – audio signal

  • repetitions (int) – number of times to replicate signal

  • axis (int, optional) – axis along which to replicate. Default: -1

Returns

replicated signal

Return type

numpy.ndarray

Example

>>> a = np.array([1, 2, 3])
>>> replicate(a, 3)
array([1, 2, 3, 1, 2, 3, 1, 2, 3])

downmix

audtorch.transforms.functional.downmix(signal, channels, *, method='mean', axis=- 2)

Downmix signal to the provided number of channels.

The downmix is done by one of these methods:

  • 'mean' replace last desired channel by mean across itself and all remaining channels

  • 'crop' drop all remaining channels

Parameters
  • signal (numpy.ndarray) – audio signal

  • channels (int) – number of desired channels

  • method (str, optional) – downmix method. Default: ‘mean’

  • axis (int, optional) – axis to downmix. Default: -2

Returns

reshaped signal

Return type

numpy.ndarray

Example

>>> a = np.array([[1, 2], [3, 4]])
>>> downmix(a, 1)
array([[2, 3]])

upmix

audtorch.transforms.functional.upmix(signal, channels, *, method='mean', axis=- 2)

Upmix signal to the provided number of channels.

The upmix is achieved by adding the same signal in the additional channels. The fixed signal is calculated by one of the following methods:

  • 'mean' mean across all input channels

  • 'zero' zeros

  • 'repeat' last input channel

Parameters
  • signal (numpy.ndarray) – audio signal

  • channels (int) – number of desired channels

  • method (str, optional) – upmix method. Default: ‘mean’

  • axis (int, optional) – axis to upmix. Default: -2

Returns

reshaped signal

Return type

numpy.ndarray

Example

>>> a = np.array([[1, 2], [3, 4]])
>>> upmix(a, 3)
array([[1., 2.],
       [3., 4.],
       [2., 3.]])

additive_mix

audtorch.transforms.functional.additive_mix(signal1, signal2, ratio)

Mix two signals additively by given ratio.

If the power of one of the signals is below 1e-7, the signals are added without adjusting the signal-to-noise ratio.

Parameters
  • signal1 (numpy.ndarray) – audio signal

  • signal2 (numpy.ndarray) – audio signal

  • ratio (int) – ratio in dB of the second signal compared to the first one

Returns

mixture

Return type

numpy.ndarray

Example

>>> a = np.array([[1, 2], [3, 4]])
>>> additive_mix(a, a, -10 * np.log10(0.5 ** 2))
array([[1.5, 3. ],
       [4.5, 6. ]])

mask

audtorch.transforms.functional.mask(signal, num_blocks, max_width, *, value=0.0, axis=- 1)

Randomly mask signal along axis.

Parameters
  • signal (torch.Tensor) – audio signal

  • num_blocks (int) – number of mask blocks

  • max_width (int) – maximum size of block

  • value (float, optional) – mask value. Default: 0.

  • axis (int, optional) – axis along which to mask. Default: -1

Returns

masked signal

Return type

torch.Tensor

normalize

audtorch.transforms.functional.normalize(signal, *, axis=None)

Normalize signal.

Ensure the maximum of the absolute value of the signal is 1.

Note

The signal will never be divided by a number smaller than 1e-7. Meaning signals which are nearly silent are only slightly amplified.

Parameters
  • signal (numpy.ndarray) – audio signal

  • axis (int, optional) – normalize only along the given axis. Default: None

Returns

normalized signal

Return type

numpy.ndarray

Example

>>> a = np.array([[1, 2], [3, 4]])
>>> normalize(a)
array([[0.25, 0.5 ],
       [0.75, 1.  ]])

standardize

audtorch.transforms.functional.standardize(signal, *, mean=True, std=True, axis=None)

Standardize signal.

Ensure the signal has a mean value of 0 and a variance of 1.

Note

The signal will never be divided by a variance smaller than 1e-7.

Parameters
  • signal (numpy.ndarray) – audio signal

  • mean (bool, optional) – apply mean centering. Default: True

  • std (bool, optional) – normalize by standard deviation. Default: True

  • axis (int, optional) – standardize only along the given axis. Default: None

Returns

standardized signal

Return type

numpy.ndarray

Example

>>> a = np.array([[1, 2], [3, 4]])
>>> standardize(a)
array([[-1.34164079, -0.4472136 ],
       [ 0.4472136 ,  1.34164079]])

stft

audtorch.transforms.functional.stft(signal, window_size, hop_size, *, fft_size=None, window='hann', axis=- 1)

Short-time Fourier transform.

The Short-time Fourier transform (STFT) is calculated by using librosa. It returns an array with the same shape as the input array, besides the axis chosen for STFT calculation is replaced by the two new ones of the spectrogram.

The chosen FFT size is set identical to window_size.

Parameters
  • signal (numpy.ndarray) – audio signal

  • window_size (int) – size of STFT window in samples

  • hop_size (int) – size of STFT window hop in samples

  • window (str, tuple, number, function, or numpy.ndarray, optional) – type of STFT window. Default: hann

  • axis (int, optional) – axis of STFT calculation. Default: -1

Returns

complex spectrogram with the shape of its last two dimensions as (window_size/2 + 1, np.ceil((len(signal) + window_size/2) / hop_size))

Return type

numpy.ndarray

Example

>>> a = np.array([1., 2., 3., 4.])
>>> stft(a, 2, 1)
array([[ 1.+0.j,  2.+0.j,  3.+0.j,  4.+0.j,  3.+0.j],
       [-1.+0.j, -2.+0.j, -3.+0.j, -4.+0.j, -3.+0.j]])

istft

audtorch.transforms.functional.istft(spectrogram, window_size, hop_size, *, window='hann', axis=- 2)

Inverse Short-time Fourier transform.

The inverse Short-time Fourier transform (iSTFT) is calculated by using librosa. It handles multi-dimensional inputs, but assumes that the two spectrogram axis are beside each other, starting with the axis corresponding to frequency bins. The returned audio signal has one dimension less than the spectrogram.

Parameters
  • spectrogram (numpy.ndarray) – complex spectrogram

  • window_size (int) – size of STFT window in samples

  • hop_size (int) – size of STFT window hop in samples

  • window (str, tuple, number, function, or numpy.ndarray, optional) – type of STFT window. Default: hann

  • axis (int, optional) – axis of frequency bins of the spectrogram. Time bins are expected at axis + 1. Default: -2

Returns

signal with shape (number_of_time_bins * hop_size - window_size/2)

Return type

numpy.ndarray

Example

>>> a = np.array([1., 2., 3., 4.])
>>> D = stft(a, 4, 1)
>>> istft(D, 4, 1)
array([1., 2., 3., 4.])