SegmentWithFeature¶

class audinterface.SegmentWithFeature(feature_names, *, name=None, params=None, process_func=None, process_func_args=None, sampling_rate=None, resample=False, channels=0, mixdown=False, min_signal_dur=None, max_signal_dur=None, keep_nat=False, num_workers=1, multiprocessing=False, verbose=False)[source]¶

Segmentation with feature interface.

Interface for functions that apply a segmentation to the input signal, and also compute features for those segments at the same time, e.g. a speech recognition model that recognizes speech and also provides the time stamps of that speech.

The features are returned as a pandas.DataFrame with num_features columns and one row per detected segment.

Parameters

feature_names (str | Sequence[str]) – features are stored as columns in a data frame, where feature_names defines the names of the columns.
name (Optional[str]) – name of the feature set, e.g. 'stft'
params (Optional[dict[str, object]]) – parameters that describe the feature set, e.g. {'win_size': 512, 'hop_size': 256, 'num_fft': 512}. With the parameters you can differentiate different flavors of the same feature set
process_func (Optional[Callable[..., Series]]) – segmentation with feature function, which expects the two positional arguments signal and sampling_rate and any number of additional keyword arguments (see process_func_args). There are the following special arguments: 'idx', 'file', 'root'. If expected by the function, but not specified in process_func_args, they will be replaced with: a running index, the currently processed file, the root folder. Must return a pandas.Series with a pandas.MultiIndex with two levels named start and end that hold start and end positions as pandas.Timedelta objects, and with elements in the shape of (num_features).
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function
sampling_rate (Optional[int]) – sampling rate in Hz. If None it will call process_func with the actual sampling rate of the signal
resample (bool) – if True enforces given sampling rate by resampling
channels (int | Sequence[int]) – channel selection, see audresample.remix()
mixdown (bool) – apply mono mix-down on selection
min_signal_dur (Union[float, int, str, Timedelta, None]) – minimum signal length required by process_func. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options. If provided signal is shorter, it will be zero padded at the end
max_signal_dur (Union[float, int, str, Timedelta, None]) – maximum signal length required by process_func. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options. If provided signal is longer, it will be cut at the end
keep_nat (bool) – if the end of segment is set to NaT do not replace with file duration in the result
num_workers (Optional[int]) – number of parallel jobs or 1 for sequential processing. If None will be set to the number of processors on the machine multiplied by 5 in case of multithreading and number of processors in case of multiprocessing
multiprocessing (bool) – use multiprocessing instead of multithreading
verbose (bool) – show debug messages

Raises

ValueError – if resample = True, but sampling_rate = None

Examples

>>> def segment_with_mean_std(signal, sampling_rate, *, win_size=1.0, hop_size=1.0):
...     size = signal.shape[1] / sampling_rate
...     starts = pd.to_timedelta(
...         np.arange(0, size - win_size + (1 / sampling_rate), hop_size),
...         unit="s",
...     )
...     ends = pd.to_timedelta(
...         np.arange(win_size, size + (1 / sampling_rate), hop_size), unit="s"
...     )
...     # Get windows of shape (channels, samples, frames)
...     frames = utils.sliding_window(signal, sampling_rate, win_size, hop_size)
...     means = frames.mean(axis=(0, 1))
...     stds = frames.std(axis=(0, 1))
...     index = pd.MultiIndex.from_tuples(zip(starts, ends), names=["start", "end"])
...     features = list(np.stack((means, stds), axis=-1))
...     return pd.Series(data=features, index=index)
>>> interface = SegmentWithFeature(
...     feature_names=["mean", "std"], process_func=segment_with_mean_std
... )
>>> signal = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
>>> interface(signal, sampling_rate=2)
start            end
0 days 00:00:00  0 days 00:00:01    [1.5, 0.5]
0 days 00:00:01  0 days 00:00:02    [3.5, 0.5]
0 days 00:00:02  0 days 00:00:03    [5.5, 0.5]
dtype: object
>>> interface.process_signal(signal, sampling_rate=2)
                                mean  std
start           end
0 days 00:00:00 0 days 00:00:01   1.5  0.5
0 days 00:00:01 0 days 00:00:02   3.5  0.5
0 days 00:00:02 0 days 00:00:03   5.5  0.5
>>> # Apply interface on an audformat conform index of a dataframe
>>> import audb
>>> db = audb.load(
...     "emodb",
...     version="1.3.0",
...     media="wav/03a01Fa.wav",
...     full_path=False,
...     verbose=False,
... )
>>> index = db["emotion"].index
>>> interface.process_index(index, root=db.root)
                                            mean       std
file            start  end
wav/03a01Fa.wav 0 days 0 days 00:00:01 -0.000329  0.098115

call()¶

SegmentWithFeature.__call__(signal, sampling_rate)[source]¶

Apply processing to signal.

This function processes the signal without transforming the output into a pd.DataFrame. Instead, it will return the raw processed signal. However, if channel selection, mixdown and/or resampling is enabled, the signal will be first remixed and resampled if the input sampling rate does not fit the expected sampling rate.

Parameters

signal (ndarray) – signal values
sampling_rate (int) – sampling rate in Hz

Return type

Series

Returns

Processed signal

Raises

RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
ValueError – if the process function doesn’t return a pd.Series with index conform to audformat and elements of shape (num_features)

feature_names¶

SegmentWithFeature.feature_names¶: Feature names.

name¶

SegmentWithFeature.name¶: Name of the feature set.

num_features¶

SegmentWithFeature.num_features¶: Number of features.

params¶

SegmentWithFeature.params¶: Dictionary of parameters describing the feature set.

process¶

SegmentWithFeature.process¶: Processing object.

process_file()¶

SegmentWithFeature.process_file(file, *, start=None, end=None, root=None, process_func_args=None)[source]¶

Segment the content of an audio file and extract features.

Parameters

file (str) – file path
start (Union[float, int, str, Timedelta, None]) – start processing at this position. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options
end (Union[float, int, str, Timedelta, None]) – end processing at this position. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options
root (Optional[str]) – root folder to expand relative file path
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.SegmentWithFeature.process.process_func_args

Return type

DataFrame

Returns

pandas.DataFrame with segmented index conform to audformat

Raises

RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
ValueError – if the process function doesn’t return a pd.Series with index conform to audformat and elements of shape (num_features)

process_files()¶

SegmentWithFeature.process_files(files, *, starts=None, ends=None, root=None, process_func_args=None)[source]¶

Segment and extract features for a list of files.

Parameters

files (Sequence[str]) – list of file paths
starts (Union[float, int, str, Timedelta, Sequence[Union[float, int, str, Timedelta]], None]) – segment start positions. Time values given as float or integers are treated as seconds. See audinterface.utils.to_timedelta() for further options. If a scalar is given, it is applied to all files
ends (Union[float, int, str, Timedelta, Sequence[Union[float, int, str, Timedelta]], None]) – segment end positions. Time values given as float or integers are treated as seconds See audinterface.utils.to_timedelta() for further options. If a scalar is given, it is applied to all files
root (Optional[str]) – root folder to expand relative file paths
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.SegmentWithFeature.process.process_func_args

Return type

DataFrame

Returns

pandas.DataFrame with segmented index conform to audformat

Raises

RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
ValueError – if the process function doesn’t return a pd.Series with index conform to audformat and elements of shape (num_features)

process_folder()¶

SegmentWithFeature.process_folder(root, *, filetype='wav', include_root=True, process_func_args=None)[source]¶

Segment and extract features for files in a folder.

Note

At the moment does not scan in sub-folders!

Parameters

root (str) – root folder
filetype (str) – file extension
include_root (bool) – if True the file paths are absolute in the index of the returned result
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.SegmentWithFeature.process.process_func_args

Return type

DataFrame

Returns

pandas.DataFrame with segmented index conform to audformat

Raises

FileNotFoundError – if folder does not exist
RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
ValueError – if the process function doesn’t return a pd.Series with index conform to audformat and elements of shape (num_features)

process_index()¶

SegmentWithFeature.process_index(index, *, root=None, cache_root=None, process_func_args=None)[source]¶

Segment and extract features for files or segments from an index.

If cache_root is not None, a hash value is created from the index using audformat.utils.hash() and the result is stored as <cache_root>/<hash>.pkl. When called again with the same index, results will be read from the cached file.

Parameters

index (Index) – index conform to audformat
root (Optional[str]) – root folder to expand relative file paths
cache_root (Optional[str]) – cache folder (see description)
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.SegmentWithFeature.process.process_func_args

Return type

DataFrame

Returns

pandas.DataFrame with segmented index conform to audformat

Raises

RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
ValueError – if the process function doesn’t return a pd.Series with index conform to audformat and elements of shape (num_features)

process_signal()¶

SegmentWithFeature.process_signal(signal, sampling_rate, *, file=None, start=None, end=None, process_func_args=None)[source]¶

Segment and extract features for audio signal.

Note

If a file is given, the index of the returned frame has levels file, start and end. Otherwise, it consists only of start and end.

Parameters

signal (ndarray) – signal values
sampling_rate (int) – sampling rate in Hz
file (Optional[str]) – file path
start (Union[float, int, str, Timedelta, None]) – start processing at this position. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options
end (Union[float, int, str, Timedelta, None]) – end processing at this position. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.SegmentWithFeature.process.process_func_args

Return type

DataFrame

Returns

pandas.DataFrame with segmented index conform to audformat

Raises

RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
ValueError – if the process function doesn’t return a pd.Series with index conform to audformat and elements of shape (num_features)

process_signal_from_index()¶

SegmentWithFeature.process_signal_from_index(signal, sampling_rate, index, process_func_args=None)[source]¶

Segment and extract features for parts of a signal.

Parameters

signal (ndarray) – signal values
sampling_rate (int) – sampling rate in Hz
index (Index) – a segmented index conform to audformat or a pandas.MultiIndex with two levels named start and end that hold start and end positions as pandas.Timedelta objects. See also audinterface.utils.signal_index()
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.SegmentWithFeature.process.process_func_args

Return type

DataFrame

Returns

pandas.DataFrame with segmented index conform to audformat

Raises

RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid
ValueError – if index contains duplicates
ValueError – if the process function doesn’t return a pd.Series with index conform to audformat and elements of shape (num_features)

process_table()¶

SegmentWithFeature.process_table(table, *, root=None, cache_root=None, process_func_args=None)[source]¶

Segment and extract features for files or segments from a table.

The labels of the table are reassigned to the new segments. The columns of the table may not overlap with the audinterface.SegmentWithFeature.feature_names.

If cache_root is not None, a hash value is created from the index using audformat.utils.hash() and the result is stored as <cache_root>/<hash>.pkl. When called again with the same index, results will be read from the cached file.

Parameters

table (Series | DataFrame) – pandas.Series or pandas.DataFrame with an index conform to audformat
root (Optional[str]) – root folder to expand relative file paths
cache_root (Optional[str]) – cache folder (see description)
process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.SegmentWithFeature.process.process_func_args

Return type

DataFrame

Returns

pandas.DataFrame with segmented index conform to audformat

Raises

ValueError – if table is not a pandas.Series or a pandas.DataFrame
ValueError – if the table columns and the extracted feature columns overlap
ValueError – if the process function doesn’t return a pd.Series with index conform to audformat and elements of shape (num_features)
RuntimeError – if sampling rates do not match
RuntimeError – if channel selection is invalid

SegmentWithFeature¶

__call__()¶

feature_names¶

name¶

num_features¶

params¶

process¶

process_file()¶

process_files()¶

process_folder()¶

process_index()¶

process_signal()¶

process_signal_from_index()¶

process_table()¶

call()¶