difference()¶

audformat.utils.difference(objs)[source]¶

Difference of index objects.

Returns index items that are not shared by two or more objects. For two objects this is identical to their symmetric difference.

If all index objects are conform to table specifications and at least one object is segmented, the output is a segmented index. Otherwise, requires that levels and dtypes of all objects match, see audformat.utils.is_index_alike(). Integer dtypes don’t have to match, but the result will always be of dtype Int64. When the symmetric difference of a pandas.Index with a single-level pandas.MultiIndex, is calculated, the result is a pandas.Index.

The order of the resulting index depends on the order of objs. If you require audformat.utils.difference() to be commutative, you have to sort its output.

Parameters: objs (Sequence[Index]) – index objects
Return type: Index
Returns: difference of index objects
Raises: ValueError – if level and dtypes of objects do not match

Examples

>>> difference(
...     [
...         pd.Index([1, 2, 3], name="idx"),
...     ]
... )
Index([1, 2, 3], dtype='Int64', name='idx')
>>> difference(
...     [
...         pd.Index([0, 1], name="idx"),
...         pd.Index([1, np.NaN], dtype="Int64", name="idx"),
...     ]
... )
Index([0, <NA>], dtype='Int64', name='idx')
>>> difference(
...     [
...         pd.Index([0, 1], name="idx"),
...         pd.MultiIndex.from_arrays([[1, 2]], names=["idx"]),
...     ]
... )
Index([0, 2], dtype='Int64', name='idx')
>>> difference(
...     [
...         pd.MultiIndex.from_arrays(
...             [["a", "b", "c"], [0, 1, 2]],
...             names=["idx1", "idx2"],
...         ),
...         pd.MultiIndex.from_arrays(
...             [["b", "c"], [1, 3]],
...             names=["idx1", "idx2"],
...         ),
...     ]
... )
MultiIndex([('a', 0),
            ('c', 2),
            ('c', 3)],
           names=['idx1', 'idx2'])
>>> difference(
...     [
...         filewise_index(["f1", "f2", "f3"]),
...         filewise_index(["f2", "f3", "f4"]),
...     ]
... )
Index(['f1', 'f4'], dtype='string', name='file')
>>> difference(
...     [
...         segmented_index(["f1"], [0], [1]),
...         segmented_index(["f1", "f2"], [0, 1], [1, 2]),
...     ]
... )
MultiIndex([('f2', '0 days 00:00:01', '0 days 00:00:02')],
           names=['file', 'start', 'end'])
>>> difference(
...     [
...         filewise_index(["f1", "f2"]),
...         segmented_index(["f1", "f2"], [0, 0], [pd.NaT, 1]),
...     ]
... )
MultiIndex([('f2', '0 days',               NaT),
            ('f2', '0 days', '0 days 00:00:01')],
           names=['file', 'start', 'end'])