to_segmented_index()¶

audformat.utils.to_segmented_index(obj, *, allow_nat=True, files_duration=None, root=None, num_workers=1, verbose=False)[source]¶

Convert to segmented index.

If the input a filewise table, start and end will be added as new levels to the index. By default, start will be set to 0 and end to NaT.

If allow_nat is set to False, all occurrences of end=NaT are replaced with the duration of the file. This requires that the referenced file exists, or that the durations are provided with files_duration. If file names in the index are relative, the root argument can be used to provide the location where the files are stored.

Parameters

obj (Index | Series | DataFrame) – object conform to table specifications
allow_nat (bool) – if set to False, end=NaT is replaced with file duration
files_duration (Optional[MutableMapping[str, Timedelta]]) – mapping from file to duration. If not None, used to look up durations. If no entry is found for a file, it is added to the mapping. Expects absolute file names and durations as pd.Timedelta objects. Only relevant if allow_nat is set to False
root (Optional[str]) – root directory under which the files referenced in the index are stored
num_workers (Optional[int]) – number of parallel jobs. If None will be set to the number of processors on the machine multiplied by 5
verbose (bool) – show progress bar

Return type

Index | Series | DataFrame

Returns

object with segmented index

Raises

ValueError – if object not conform to table specifications
FileNotFoundError – if file is not found

Examples

>>> index = filewise_index(["f1", "f2"])
>>> to_segmented_index(index)
MultiIndex([('f1', '0 days', NaT),
            ('f2', '0 days', NaT)],
           names=['file', 'start', 'end'])
>>> to_segmented_index(
...     index,
...     allow_nat=False,
...     files_duration={
...         "f1": pd.to_timedelta(1.1, unit="s"),
...         "f2": pd.to_timedelta(2.2, unit="s"),
...     },
... )
MultiIndex([('f1', '0 days', '0 days 00:00:01.100000'),
            ('f2', '0 days', '0 days 00:00:02.200000')],
           names=['file', 'start', 'end'])