Segment

class audinterface.Segment(*, process_func=None, process_func_args=None, invert=False, sampling_rate=None, resample=False, channels=None, mixdown=False, min_signal_dur=None, max_signal_dur=None, keep_nat=False, num_workers=1, multiprocessing=False, verbose=False)[source]

Segmentation interface.

Interface for models that apply a segmentation to the input signal, e.g. a voice activity model that detects speech regions.

Parameters
  • process_func (Optional[Callable[..., MultiIndex]]) – segmentation function, which expects the two positional arguments signal and sampling_rate and any number of additional keyword arguments (see process_func_args). There are the following special arguments: 'idx', 'file', 'root'. If expected by the function, but not specified in process_func_args, they will be replaced with: a running index, the currently processed file, the root folder. Must return a pandas.MultiIndex with two levels named start and end that hold start and end positions as pandas.Timedelta objects

  • process_func_args (Optional[Dict[str, Any]]) – (keyword) arguments passed on to the processing function

  • invert (bool) – Invert the segmentation

  • sampling_rate (Optional[int]) – sampling rate in Hz If None it will call process_func with the actual sampling rate of the signal

  • resample (bool) – if True enforces given sampling rate by resampling

  • channels (Union[int, Sequence[int], None]) – channel selection, see audresample.remix()

  • mixdown (bool) – apply mono mix-down on selection

  • min_signal_dur (Union[float, int, str, Timedelta, None]) – minimum signal length required by process_func. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options. If provided signal is shorter, it will be zero padded at the end

  • max_signal_dur (Union[float, int, str, Timedelta, None]) – maximum signal length required by process_func. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options. If provided signal is longer, it will be cut at the end

  • keep_nat (bool) – if the end of segment is set to NaT do not replace with file duration in the result

  • num_workers (Optional[int]) – number of parallel jobs or 1 for sequential processing. If None will be set to the number of processors on the machine multiplied by 5 in case of multithreading and number of processors in case of multiprocessing

  • multiprocessing (bool) – use multiprocessing instead of multithreading

  • verbose (bool) – show debug messages

Raises

ValueError – if resample = True, but sampling_rate = None

Examples

>>> def segment(signal, sampling_rate, *, win_size=0.2, hop_size=0.1):
...     size = signal.shape[1] / sampling_rate
...     starts = pd.to_timedelta(np.arange(0, size - win_size, hop_size), unit="s")
...     ends = pd.to_timedelta(np.arange(win_size, size, hop_size), unit="s")
...     return pd.MultiIndex.from_tuples(zip(starts, ends), names=["start", "end"])
>>> interface = Segment(process_func=segment)
>>> signal = np.array([1.0, 2.0, 3.0])
>>> interface(signal, sampling_rate=3)
MultiIndex([(       '0 days 00:00:00', '0 days 00:00:00.200000'),
            ('0 days 00:00:00.100000', '0 days 00:00:00.300000'),
            ('0 days 00:00:00.200000', '0 days 00:00:00.400000'),
            ('0 days 00:00:00.300000', '0 days 00:00:00.500000'),
            ('0 days 00:00:00.400000', '0 days 00:00:00.600000'),
            ('0 days 00:00:00.500000', '0 days 00:00:00.700000'),
            ('0 days 00:00:00.600000', '0 days 00:00:00.800000'),
            ('0 days 00:00:00.700000', '0 days 00:00:00.900000')],
           names=['start', 'end'])
>>> # Apply interface on an audformat conform index of a dataframe
>>> import audb
>>> db = audb.load(
...     "emodb",
...     version="1.3.0",
...     media="wav/03a01Fa.wav",
...     full_path=False,
...     verbose=False,
... )
>>> interface = Segment(
...     process_func=segment,
...     process_func_args={"win_size": 0.5, "hop_size": 0.25},
... )
>>> interface.process_index(db["emotion"].index, root=db.root)
MultiIndex([('wav/03a01Fa.wav',        '0 days 00:00:00', ...),
            ('wav/03a01Fa.wav', '0 days 00:00:00.250000', ...),
            ('wav/03a01Fa.wav', '0 days 00:00:00.500000', ...),
            ('wav/03a01Fa.wav', '0 days 00:00:00.750000', ...),
            ('wav/03a01Fa.wav',        '0 days 00:00:01', ...),
            ('wav/03a01Fa.wav', '0 days 00:00:01.250000', ...)],
           names=['file', 'start', 'end'])

__call__()

Segment.__call__(signal, sampling_rate)[source]

Apply processing to signal.

This function processes the signal without transforming the output into a pd.MultiIndex. Instead, it will return the raw processed signal. However, if channel selection, mixdown and/or resampling is enabled, the signal will be first remixed and resampled if the input sampling rate does not fit the expected sampling rate.

Parameters
  • signal (ndarray) – signal values

  • sampling_rate (int) – sampling rate in Hz

Return type

Index

Returns

Processed signal

Raises

invert

Segment.invert

Invert segmentation.

process

Segment.process

Processing object.

process_file()

Segment.process_file(file, *, start=None, end=None, root=None, process_func_args=None)[source]

Segment the content of an audio file.

Parameters
Return type

Index

Returns

Segmented index conform to audformat

Raises

process_files()

Segment.process_files(files, *, starts=None, ends=None, root=None, process_func_args=None)[source]

Segment a list of files.

Parameters
Return type

Index

Returns

Segmented index conform to audformat

Raises

process_folder()

Segment.process_folder(root, *, filetype='wav', include_root=True, process_func_args=None)[source]

Segment files in a folder.

Note

At the moment does not scan in sub-folders!

Parameters
  • root (str) – root folder

  • filetype (str) – file extension

  • include_root (bool) – if True the file paths are absolute in the index of the returned result

  • process_func_args (Optional[Dict[str, Any]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Segment.process.process_func_args

Return type

Index

Returns

Segmented index conform to audformat

Raises

process_index()

Segment.process_index(index, *, root=None, cache_root=None, process_func_args=None)[source]

Segment files or segments from an index.

If cache_root is not None, a hash value is created from the index using audformat.utils.hash() and the result is stored as <cache_root>/<hash>.pkl. When called again with the same index, results will be read from the cached file.

Parameters
  • index (Index) – index conform to audformat

  • root (Optional[str]) – root folder to expand relative file paths

  • cache_root (Optional[str]) – cache folder (see description)

  • process_func_args (Optional[Dict[str, Any]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Segment.process.process_func_args

Return type

Index

Returns

Segmented index conform to audformat

Raises

process_signal()

Segment.process_signal(signal, sampling_rate, *, file=None, start=None, end=None, process_func_args=None)[source]

Segment audio signal.

Note

If a file is given, the index of the returned frame has levels file, start and end. Otherwise, it consists only of start and end.

Parameters
Return type

Index

Returns

Segmented index conform to audformat

Raises

process_signal_from_index()

Segment.process_signal_from_index(signal, sampling_rate, index, process_func_args=None)[source]

Segment parts of a signal.

Parameters
Return type

Index

Returns

Segmented index conform to audformat

Raises