difference()

audformat.utils.difference(objs)[source]

Difference of index objects.

Returns index items that are not shared by two or more objects. For two objects this is identical to their symmetric difference.

If all index objects are conform to table specifications and at least one object is segmented, the output is a segmented index. Otherwise, requires that levels and dtypes of all objects match, see audformat.utils.is_index_alike(). Integer dtypes don’t have to match, but the result will always be of dtype Int64. When the symmetric difference of a pandas.Index with a single-level pandas.MultiIndex, is calculated, the result is a pandas.Index.

The order of the resulting index depends on the order of objs. If you require audformat.utils.difference() to be commutative, you have to sort its output.

Parameters

objs (Sequence[Index]) – index objects

Return type

Index

Returns

difference of index objects

Raises

ValueError – if level and dtypes of objects do not match

Examples

>>> difference(
...     [
...         pd.Index([1, 2, 3], name="idx"),
...     ]
... )
Index([1, 2, 3], dtype='Int64', name='idx')
>>> difference(
...     [
...         pd.Index([0, 1], name="idx"),
...         pd.Index([1, np.nan], dtype="Int64", name="idx"),
...     ]
... )
Index([0, <NA>], dtype='Int64', name='idx')
>>> difference(
...     [
...         pd.Index([0, 1], name="idx"),
...         pd.MultiIndex.from_arrays([[1, 2]], names=["idx"]),
...     ]
... )
Index([0, 2], dtype='Int64', name='idx')
>>> difference(
...     [
...         pd.MultiIndex.from_arrays(
...             [["a", "b", "c"], [0, 1, 2]],
...             names=["idx1", "idx2"],
...         ),
...         pd.MultiIndex.from_arrays(
...             [["b", "c"], [1, 3]],
...             names=["idx1", "idx2"],
...         ),
...     ]
... )
MultiIndex([('a', 0),
            ('c', 2),
            ('c', 3)],
           names=['idx1', 'idx2'])
>>> difference(
...     [
...         filewise_index(["f1", "f2", "f3"]),
...         filewise_index(["f2", "f3", "f4"]),
...     ]
... )
Index(['f1', 'f4'], dtype='string', name='file')
>>> difference(
...     [
...         segmented_index(["f1"], [0], [1]),
...         segmented_index(["f1", "f2"], [0, 1], [1, 2]),
...     ]
... )
MultiIndex([('f2', '0 days 00:00:01', '0 days 00:00:02')],
           names=['file', 'start', 'end'])
>>> difference(
...     [
...         filewise_index(["f1", "f2"]),
...         segmented_index(["f1", "f2"], [0, 0], [pd.NaT, 1]),
...     ]
... )
MultiIndex([('f2', '0 days',               NaT),
            ('f2', '0 days', '0 days 00:00:01')],
           names=['file', 'start', 'end'])