to_segmented_index()¶
- audformat.utils.to_segmented_index(obj, *, allow_nat=True, files_duration=None, root=None, num_workers=1, verbose=False)[source]¶
Convert to segmented index.
If the input a filewise table,
start
andend
will be added as new levels to the index. By default,start
will be set to 0 andend
toNaT
.If
allow_nat
is set toFalse
, all occurrences ofend=NaT
are replaced with the duration of the file. This requires that the referenced file exists, or that the durations are provided withfiles_duration
. If file names in the index are relative, theroot
argument can be used to provide the location where the files are stored.- Parameters
obj (
Union
[Index
,Series
,DataFrame
]) – object conform to table specificationsallow_nat (
bool
) – if set toFalse
,end=NaT
is replaced with file durationfiles_duration (
Optional
[MutableMapping
[str
,Timedelta
]]) – mapping from file to duration. If notNone
, used to look up durations. If no entry is found for a file, it is added to the mapping. Expects absolute file names and durations aspd.Timedelta
objects. Only relevant ifallow_nat
is set toFalse
root (
Optional
[str
]) – root directory under which the files referenced in the index are storednum_workers (
Optional
[int
]) – number of parallel jobs. IfNone
will be set to the number of processors on the machine multiplied by 5verbose (
bool
) – show progress bar
- Return type
- Returns
object with segmented index
- Raises
ValueError – if object not conform to table specifications
FileNotFoundError – if file is not found
Examples
>>> index = filewise_index(["f1", "f2"]) >>> to_segmented_index(index) MultiIndex([('f1', '0 days', NaT), ('f2', '0 days', NaT)], names=['file', 'start', 'end']) >>> to_segmented_index( ... index, ... allow_nat=False, ... files_duration={ ... "f1": pd.to_timedelta(1.1, unit="s"), ... "f2": pd.to_timedelta(2.2, unit="s"), ... }, ... ) MultiIndex([('f1', '0 days', '0 days 00:00:01.100000'), ('f2', '0 days', '0 days 00:00:02.200000')], names=['file', 'start', 'end'])