Segment

class audinterface.Segment(*, process_func=None, process_func_args=None, invert=False, sampling_rate=None, resample=False, channels=None, mixdown=False, min_signal_dur=None, max_signal_dur=None, keep_nat=False, num_workers=1, multiprocessing=False, verbose=False)[source]

Segmentation interface.

Interface for models that apply a segmentation to the input signal, e.g. a voice activity model that detects speech regions.

Parameters:
  • process_func (Optional[Callable[..., MultiIndex]]) – segmentation function, which expects the two positional arguments signal and sampling_rate and any number of additional keyword arguments (see process_func_args). There are the following special arguments: 'idx', 'file', 'root'. If expected by the function, but not specified in process_func_args, they will be replaced with: a running index, the currently processed file, the root folder. Must return a pandas.MultiIndex with two levels named start and end that hold start and end positions as pandas.Timedelta objects

  • process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function

  • invert (bool) – Invert the segmentation

  • sampling_rate (Optional[int]) – sampling rate in Hz If None it will call process_func with the actual sampling rate of the signal

  • resample (bool) – if True enforces given sampling rate by resampling

  • channels (Union[int, Sequence[int], None]) – channel selection, see audresample.remix()

  • mixdown (bool) – apply mono mix-down on selection

  • min_signal_dur (Union[float, int, str, Timedelta, None]) – minimum signal length required by process_func. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options. If provided signal is shorter, it will be zero padded at the end

  • max_signal_dur (Union[float, int, str, Timedelta, None]) – maximum signal length required by process_func. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options. If provided signal is longer, it will be cut at the end

  • keep_nat (bool) – if the end of segment is set to NaT do not replace with file duration in the result

  • num_workers (int | None) – number of parallel jobs or 1 for sequential processing. If None will be set to the number of processors on the machine multiplied by 5 in case of multithreading and number of processors in case of multiprocessing

  • multiprocessing (bool) – use multiprocessing instead of multithreading

  • verbose (bool) – show debug messages

Raises:

ValueError – if resample = True, but sampling_rate = None

Examples

>>> def segment(signal, sampling_rate, *, win_size=0.2, hop_size=0.1):
...     size = signal.shape[1] / sampling_rate
...     starts = pd.to_timedelta(np.arange(0, size - win_size, hop_size), unit="s")
...     ends = pd.to_timedelta(np.arange(win_size, size, hop_size), unit="s")
...     return pd.MultiIndex.from_tuples(zip(starts, ends), names=["start", "end"])
>>> interface = Segment(process_func=segment)
>>> signal = np.array([1.0, 2.0, 3.0])
>>> interface(signal, sampling_rate=3)
MultiIndex([(       '0 days 00:00:00', '0 days 00:00:00.200000'),
            ('0 days 00:00:00.100000', '0 days 00:00:00.300000'),
            ('0 days 00:00:00.200000', '0 days 00:00:00.400000'),
            ('0 days 00:00:00.300000', '0 days 00:00:00.500000'),
            ('0 days 00:00:00.400000', '0 days 00:00:00.600000'),
            ('0 days 00:00:00.500000', '0 days 00:00:00.700000'),
            ('0 days 00:00:00.600000', '0 days 00:00:00.800000'),
            ('0 days 00:00:00.700000', '0 days 00:00:00.900000')],
           names=['start', 'end'])
>>> # Apply interface on an audformat conform index of a dataframe
>>> import audb
>>> db = audb.load(
...     "emodb",
...     version="1.3.0",
...     media="wav/03a01Fa.wav",
...     full_path=False,
...     verbose=False,
... )
>>> interface = Segment(
...     process_func=segment,
...     process_func_args={"win_size": 0.5, "hop_size": 0.25},
... )
>>> interface.process_index(db["emotion"].index, root=db.root)
MultiIndex([('wav/03a01Fa.wav',        '0 days 00:00:00', ...),
            ('wav/03a01Fa.wav', '0 days 00:00:00.250000', ...),
            ('wav/03a01Fa.wav', '0 days 00:00:00.500000', ...),
            ('wav/03a01Fa.wav', '0 days 00:00:00.750000', ...),
            ('wav/03a01Fa.wav',        '0 days 00:00:01', ...),
            ('wav/03a01Fa.wav', '0 days 00:00:01.250000', ...)],
           names=['file', 'start', 'end'])

__call__()

Segment.__call__(signal, sampling_rate)[source]

Apply processing to signal.

This function processes the signal without transforming the output into a pd.MultiIndex. Instead, it will return the raw processed signal. However, if channel selection, mixdown and/or resampling is enabled, the signal will be first remixed and resampled if the input sampling rate does not fit the expected sampling rate.

Parameters:
  • signal (ndarray) – signal values

  • sampling_rate (int) – sampling rate in Hz

Return type:

Index

Returns:

Processed signal

Raises:

invert

Segment.invert

Invert segmentation.

process

Segment.process

Processing object.

process_file()

Segment.process_file(file, *, start=None, end=None, root=None, process_func_args=None)[source]

Segment the content of an audio file.

Parameters:
  • file (str) – file path

  • start (Union[float, int, str, Timedelta, None]) – start processing at this position. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options

  • end (Union[float, int, str, Timedelta, None]) – end processing at this position. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options

  • root (Optional[str]) – root folder to expand relative file path

  • process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Segment.process.process_func_args

Return type:

Index

Returns:

Segmented index conform to audformat

Raises:

process_files()

Segment.process_files(files, *, starts=None, ends=None, root=None, process_func_args=None)[source]

Segment a list of files.

Parameters:
  • files (Sequence[str]) – list of file paths

  • starts (Union[float, int, str, Timedelta, Sequence[Union[float, int, str, Timedelta]], None]) – segment start positions. Time values given as float or integers are treated as seconds. See audinterface.utils.to_timedelta() for further options. If a scalar is given, it is applied to all files

  • ends (Union[float, int, str, Timedelta, Sequence[Union[float, int, str, Timedelta]], None]) – segment end positions. Time values given as float or integers are treated as seconds See audinterface.utils.to_timedelta() for further options. If a scalar is given, it is applied to all files

  • root (Optional[str]) – root folder to expand relative file paths

  • process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Segment.process.process_func_args

Return type:

Index

Returns:

Segmented index conform to audformat

Raises:

process_folder()

Segment.process_folder(root, *, filetype='wav', include_root=True, process_func_args=None)[source]

Segment files in a folder.

Note

At the moment does not scan in sub-folders!

Parameters:
  • root (str) – root folder

  • filetype (str) – file extension

  • include_root (bool) – if True the file paths are absolute in the index of the returned result

  • process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Segment.process.process_func_args

Return type:

Index

Returns:

Segmented index conform to audformat

Raises:

process_index()

Segment.process_index(index, *, root=None, cache_root=None, process_func_args=None)[source]

Segment files or segments from an index.

If cache_root is not None, a hash value is created from the index using audformat.utils.hash() and the result is stored as <cache_root>/<hash>.pkl. When called again with the same index, results will be read from the cached file.

Parameters:
  • index (Index) – index conform to audformat

  • root (Optional[str]) – root folder to expand relative file paths

  • cache_root (Optional[str]) – cache folder (see description)

  • process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Segment.process.process_func_args

Return type:

Index

Returns:

Segmented index conform to audformat

Raises:

process_signal()

Segment.process_signal(signal, sampling_rate, *, file=None, start=None, end=None, process_func_args=None)[source]

Segment audio signal.

Note

If a file is given, the index of the returned frame has levels file, start and end. Otherwise, it consists only of start and end.

Parameters:
  • signal (ndarray) – signal values

  • sampling_rate (int) – sampling rate in Hz

  • file (Optional[str]) – file path

  • start (Union[float, int, str, Timedelta, None]) – start processing at this position. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options

  • end (Union[float, int, str, Timedelta, None]) – end processing at this position. If value is a float or integer it is treated as seconds. See audinterface.utils.to_timedelta() for further options

  • process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Segment.process.process_func_args

Return type:

Index

Returns:

Segmented index conform to audformat

Raises:

process_signal_from_index()

Segment.process_signal_from_index(signal, sampling_rate, index, process_func_args=None)[source]

Segment parts of a signal.

Parameters:
Return type:

Index

Returns:

Segmented index conform to audformat

Raises:

process_table()

Segment.process_table(table, *, root=None, cache_root=None, process_func_args=None)[source]

Segment files or segments from a table.

The labels of the table are reassigned to the new segments.

If cache_root is not None, a hash value is created from the index using audformat.utils.hash() and the result is stored as <cache_root>/<hash>.pkl. When called again with the same index, results will be read from the cached file.

Parameters:
  • table (Series | DataFrame) – pandas.Series or pandas.DataFrame with an index conform to audformat

  • root (Optional[str]) – root folder to expand relative file paths

  • cache_root (Optional[str]) – cache folder (see description)

  • process_func_args (Optional[dict[str, object]]) – (keyword) arguments passed on to the processing function. They will temporarily overwrite the ones stored in audinterface.Segment.process.process_func_args

Return type:

Series | DataFrame

Returns:

Segmented table with an index conform to audformat

Raises: