difference()¶
- audformat.utils.difference(objs)[source]¶
Difference of index objects.
Returns index items that are not shared by two or more objects. For two objects this is identical to their symmetric difference.
If all index objects are conform to table specifications and at least one object is segmented, the output is a segmented index. Otherwise, requires that levels and dtypes of all objects match, see
audformat.utils.is_index_alike()
. Integer dtypes don’t have to match, but the result will always be of dtypeInt64
. When the symmetric difference of apandas.Index
with a single-levelpandas.MultiIndex
, is calculated, the result is apandas.Index
.The order of the resulting index depends on the order of
objs
. If you requireaudformat.utils.difference()
to be commutative, you have to sort its output.- Parameters
- Return type
- Returns
difference of index objects
- Raises
ValueError – if level and dtypes of objects do not match
Examples
>>> difference( ... [ ... pd.Index([1, 2, 3], name="idx"), ... ] ... ) Index([1, 2, 3], dtype='Int64', name='idx') >>> difference( ... [ ... pd.Index([0, 1], name="idx"), ... pd.Index([1, np.nan], dtype="Int64", name="idx"), ... ] ... ) Index([0, <NA>], dtype='Int64', name='idx') >>> difference( ... [ ... pd.Index([0, 1], name="idx"), ... pd.MultiIndex.from_arrays([[1, 2]], names=["idx"]), ... ] ... ) Index([0, 2], dtype='Int64', name='idx') >>> difference( ... [ ... pd.MultiIndex.from_arrays( ... [["a", "b", "c"], [0, 1, 2]], ... names=["idx1", "idx2"], ... ), ... pd.MultiIndex.from_arrays( ... [["b", "c"], [1, 3]], ... names=["idx1", "idx2"], ... ), ... ] ... ) MultiIndex([('a', 0), ('c', 2), ('c', 3)], names=['idx1', 'idx2']) >>> difference( ... [ ... filewise_index(["f1", "f2", "f3"]), ... filewise_index(["f2", "f3", "f4"]), ... ] ... ) Index(['f1', 'f4'], dtype='string', name='file') >>> difference( ... [ ... segmented_index(["f1"], [0], [1]), ... segmented_index(["f1", "f2"], [0, 1], [1, 2]), ... ] ... ) MultiIndex([('f2', '0 days 00:00:01', '0 days 00:00:02')], names=['file', 'start', 'end']) >>> difference( ... [ ... filewise_index(["f1", "f2"]), ... segmented_index(["f1", "f2"], [0, 0], [pd.NaT, 1]), ... ] ... ) MultiIndex([('f2', '0 days', NaT), ('f2', '0 days', '0 days 00:00:01')], names=['file', 'start', 'end'])