Feature¶

class audinterface.Feature(feature_names, *, name=None, params=None, process_func=None, process_func_args=None, process_func_is_mono=False, process_func_applies_sliding_window=False, sampling_rate=None, resample=False, channels=0, mixdown=False, win_dur=None, hop_dur=None, min_signal_dur=None, max_signal_dur=None, segment=None, keep_nat=False, num_workers=1, multiprocessing=False, verbose=False)[source]¶

Feature extraction interface.

The features are returned as a pandas.DataFrame. If your input signal is of size (num_channels, num_time_steps), the returned object has num_channels * num_features columns. It will have one row per file or signal.

If features are extracted using a sliding window, each window will be stored as one row. If win_dur is specified start and end indices are referred from the original start and end arguments and the window positions. If win_dur is None, the original start and end indices are kept. If process_func_applies_sliding_window is set to True the processing function is responsible to apply the sliding window. Otherwise, the sliding window is applied before the processing function is called.

If the arguments win_dur and hop_dur are not specified in process_func_args, but process_func expects them, they are passed on automatically.

Parameters

feature_names (str | Sequence[str]) – features are stored as columns in a data frame, where feature_names defines the names of the columns. If len(channels) > 1, the data frame has a multi-column index with with channel ID as first level and feature_names as second level
name (Optional[str]) – name of the feature set, e.g. 'stft'
params (Optional[dict]) – parameters that describe the feature set, e.g. {'win_size': 512, 'hop_size': 256, 'num_fft': 512}. With the parameters you can differentiate different flavors of the same feature set
process_func (Optional[Callable[..., object]]) – feature extraction function, which expects the two positional arguments signal and sampling_rate and any number of additional keyword arguments (see process_func_args). There are the following special arguments: 'idx', 'file', 'root'. If expected by the function, but not specified in process_func_args, they will be replaced with: a running index, the currently processed file, the root folder. The function must return features in the shape of (num_features), (num_channels, num_features), (num_features, num_frames), or (num_channels, num_features, num_frames)
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function
process_func_is_mono (bool) – apply process_func to every channel individually
process_func_applies_sliding_window (bool) – if True the processing function receives whole files or segments and is responsible for applying a sliding window itself. If False, the sliding window is applied internally and the processing function receives individual frames instead. Applies only if features are extracted in a framewise manner (see win_dur and hop_dur)
sampling_rate (Optional[int]) – sampling rate in Hz. If None it will call process_func with the actual sampling rate of the signal
resample (bool) – if True enforces given sampling rate by resampling
channels (int | Sequence[int]) – channel selection, see audresample.remix()
win_dur (Union[float, int, str, Timedelta, None]) – window duration, if features are extracted with a sliding window. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options
hop_dur (Union[float, int, str, Timedelta, None]) – hop duration, if features are extracted with a sliding window. This defines the shift between two windows. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options. Defaults to win_dur / 2
min_signal_dur (Union[float, int, str, Timedelta, None]) – minimum signal duration required by process_func. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options. If provided signal is shorter, it will be zero padded at the end
max_signal_dur (Union[float, int, str, Timedelta, None]) – maximum signal duraton required by process_func. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options. If provided signal is longer, it will be cut at the end
mixdown (bool) – apply mono mix-down on selection
segment (Optional[Segment]) – when a audinterface.Segment object is provided, it will be used to find a segmentation of the input signal. Afterwards processing is applied to each segment
keep_nat (bool) – if the end of segment is set to NaT do not replace with file duration in the result
num_workers (Optional[int]) – number of parallel jobs or 1 for sequential processing. If None will be set to the number of processors on the machine multiplied by 5 in case of multithreading and number of processors in case of multiprocessing
multiprocessing (bool) – use multiprocessing instead of multithreading
verbose (bool) – show debug messages

Raises

ValueError – if win_dur or hop_dur are given in samples and sampling_rate is None
ValueError – if hop_dur is specified, but not win_dur

Examples

>>> def mean_std(signal, sampling_rate):
...     return [signal.mean(), signal.std()]
>>> interface = Feature(["mean", "std"], process_func=mean_std)
>>> signal = np.array([1.0, 2.0, 3.0])
>>> interface(signal, sampling_rate=3)
array([[[2.        ],
        [0.81649658]]])
>>> interface.process_signal(signal, sampling_rate=3)
                        mean       std
start  end
0 days 0 days 00:00:01   2.0  0.816497
>>> # Apply interface on an audformat conform index of a dataframe
>>> import audb
>>> db = audb.load(
...     "emodb",
...     version="1.3.0",
...     media="wav/03a01Fa.wav",
...     full_path=False,
...     verbose=False,
... )
>>> index = db["emotion"].index
>>> interface.process_index(index, root=db.root)
                                                   mean       std
file            start  end
wav/03a01Fa.wav 0 days 0 days 00:00:01.898250 -0.000311  0.082317
>>> interface.process_index(index, root=db.root, preserve_index=True)
                mean       std
file
wav/03a01Fa.wav -0.000311  0.082317
>>> # Apply interface with a sliding window
>>> interface = Feature(
...     ["mean", "std"],
...     process_func=mean_std,
...     win_dur=1.0,
...     hop_dur=0.25,
... )
>>> interface.process_index(index, root=db.root)
                                                                   mean       std
file            start                  end
wav/03a01Fa.wav 0 days 00:00:00        0 days 00:00:01        -0.000329  0.098115
                0 days 00:00:00.250000 0 days 00:00:01.250000 -0.000405  0.087917
                0 days 00:00:00.500000 0 days 00:00:01.500000 -0.000285  0.067042
                0 days 00:00:00.750000 0 days 00:00:01.750000 -0.000187  0.063677
>>> # Apply the same process function on all channels
>>> # of a multi-channel signal
>>> import audiofile
>>> signal, sampling_rate = audiofile.read(
...     audeer.path(db.root, db.files[0]),
...     always_2d=True,
... )
>>> signal_multi_channel = np.concatenate(
...     [
...         signal - 0.5,
...         signal + 0.5,
...     ],
... )
>>> interface = Feature(
...     ["mean", "std"],
...     process_func=mean_std,
...     process_func_is_mono=True,
...     channels=[0, 1],
... )
>>> interface.process_signal(
...     signal_multi_channel,
...     sampling_rate,
... )
                                      0                   1
                                   mean       std      mean       std
start  end
0 days 0 days 00:00:01.898250 -0.500311  0.082317  0.499689  0.082317

call()¶

Feature.__call__(signal, sampling_rate)[source]¶

Apply processing to signal.

This function processes the signal without transforming the output into a pandas.DataFrame. Instead, it will return the raw processed signal. However, if channel selection, mixdown and/or resampling is enabled, the signal will be first remixed and resampled if the input sampling rate does not fit the expected sampling rate.

Parameters

signal (ndarray) – signal values
sampling_rate (int) – sampling rate in Hz

Return type

ndarray

Returns

feature array with shape (num_channels, num_features, num_frames)

Raises

RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
RuntimeError – if multiple frames are returned, but win_dur is not set

column_names¶

Feature.column_names¶: Feature column names.

feature_names¶

Feature.feature_names¶: Feature names.

hop_dur¶

Feature.hop_dur¶: Hop duration.

name¶

Feature.name¶: Name of the feature set.

num_channels¶

Feature.num_channels¶: Expected number of channels

num_features¶

Feature.num_features¶: Number of features.

params¶

Feature.params¶: Dictionary of parameters describing the feature set.

process¶

Feature.process¶: Processing object.

process_file()¶

Feature.process_file(file, *, start=None, end=None, root=None, process_func_args=None)[source]¶

Extract features from an audio file.

Parameters

file (str) – file path
start (Union[float, int, str, Timedelta, None]) – start processing at this position. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options
end (Union[float, int, str, Timedelta, None]) – end processing at this position. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options
root (Optional[str]) – root folder to expand relative file path
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Feature.process.process_func_args

Raises

RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
RuntimeError – if multiple frames are returned, but win_dur is not set

Return type

DataFrame

process_files()¶

Feature.process_files(files, *, starts=None, ends=None, root=None, process_func_args=None)[source]¶

Extract features for a list of files.

Parameters

files (Sequence[str]) – list of file paths
starts (Union[float, int, str, Timedelta, Sequence[Union[float, int, str, Timedelta]], None]) – segment start positions. Time values given as float or integers are treated as seconds. See audinterface.utils.to_timedelta() for further options. If a scalar is given, it is applied to all files
ends (Union[float, int, str, Timedelta, Sequence[Union[float, int, str, Timedelta]], None]) – segment end positions. Time values given as float or integers are treated as seconds. See audinterface.utils.to_timedelta() for further options. If a scalar is given, it is applied to all files
root (Optional[str]) – root folder to expand relative file paths
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Feature.process.process_func_args

Raises

RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
RuntimeError – if multiple frames are returned, but win_dur is not set

Return type

DataFrame

process_folder()¶

Feature.process_folder(root, *, filetype='wav', include_root=True, process_func_args=None)[source]¶

Extract features from files in a folder.

Note

At the moment does not scan in sub-folders!

Parameters

root (str) – root folder
filetype (str) – file extension
include_root (bool) – if True the file paths are absolute in the index of the returned result
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Feature.process.process_func_args

Raises

FileNotFoundError – if folder does not exist
RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
RuntimeError – if multiple frames are returned, but win_dur is not set

Return type

DataFrame

process_func_applies_sliding_window¶

Feature.process_func_applies_sliding_window¶: Controls if processing function applies sliding window.

process_index()¶

Feature.process_index(index, *, preserve_index=False, root=None, cache_root=None, process_func_args=None)[source]¶

Extract features from an index conform to audformat.

If cache_root is not None, a hash value is created from the index using audformat.utils.hash() and the result is stored as <cache_root>/<hash>.pkl. When called again with the same index, features will be read from the cached file.

Parameters

index (Index) – index with segment information
preserve_index (bool) – if True and audinterface.Feature.process.segment is None the returned index will be of same type as the original one, otherwise always a segmented index is returned
root (Optional[str]) – root folder to expand relative file paths
cache_root (Optional[str]) – cache folder (see description)
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Feature.process.process_func_args

Raises

RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
RuntimeError – if multiple frames are returned, but win_dur is not set
ValueError – if index is not conform to audformat

Return type

DataFrame

process_signal()¶

Feature.process_signal(signal, sampling_rate, *, file=None, start=None, end=None, process_func_args=None)[source]¶

Extract features for an audio signal.

Note

If a file is given, the index of the returned frame has levels file, start and end. Otherwise, it consists only of start and end.

Parameters

signal (ndarray) – signal values
sampling_rate (int) – sampling rate in Hz
file (Optional[str]) – file path
start (Union[float, int, str, Timedelta, None]) – start processing at this position. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options
end (Union[float, int, str, Timedelta, None]) – end processing at this position. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Feature.process.process_func_args

Raises

RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
RuntimeError – if dimension of extracted features is greater than three
RuntimeError – if feature extractor uses sliding window, but self.win_dur is not specified
RuntimeError – if number of features does not match number of feature names
RuntimeError – if multiple frames are returned, but win_dur is not set

Return type

DataFrame

process_signal_from_index()¶

Feature.process_signal_from_index(signal, sampling_rate, index, process_func_args=None)[source]¶

Split a signal into segments and extract features for each segment.

Parameters

signal (ndarray) – signal values
sampling_rate (int) – sampling rate in Hz
index (MultiIndex) – a pandas.MultiIndex with two levels named start and end that hold start and end positions as pandas.Timedelta objects. See also audinterface.utils.signal_index()
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Feature.process.process_func_args

Raises

RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
RuntimeError – if multiple frames are returned, but win_dur is not set
ValueError – if index contains duplicates

Return type

DataFrame

to_numpy()¶

Feature.to_numpy(frame)[source]¶

Return feature values as a numpy array.

The returned numpy.ndarray has the original shape, i.e. (channels, features, time).

Parameters: frame (DataFrame) – feature frame
Return type: ndarray

verbose¶

Feature.verbose¶: Show debug messages.

win_dur¶

Feature.win_dur¶: Window duration.

Feature¶

__call__()¶

column_names¶

feature_names¶

hop_dur¶

name¶

num_channels¶

num_features¶

params¶

process¶

process_file()¶

process_files()¶

process_folder()¶

process_func_applies_sliding_window¶

process_index()¶

process_signal()¶

process_signal_from_index()¶

to_numpy()¶

verbose¶

win_dur¶

call()¶