concat()¶
- audformat.utils.concat(objs, *, overwrite=False, aggregate_function=None, aggregate_strategy='mismatch')[source]¶
Concatenate objects.
If all objects are conform to table specifications and at least one object is segmented, the output has a segmented index. Otherwise, requires that levels and dtypes of all objects match, see
audformat.utils.is_index_alike(). When apandas.Indexis concatenated with a single-levelpandas.MultiIndex, the result is apandas.Index.The new object contains index and columns of all objects. Missing values will be set to
NaN.Columns with the same identifier are combined to a single column. This requires that both columns have the same dtype and if
overwriteis set toFalse, values in places where the indices overlap have to match or one column containsNaN. Ifoverwriteis set toTrue, the value of the last object in the list is kept. Ifoverwriteis set toFalse, a custom aggregation function can be provided withaggregate_functionthat converts the overlapping values into a single value.- Parameters
overwrite (
bool) – overwrite values where indices overlapaggregate_function (
Optional[Callable[[Series],object]]) – function to aggregate overlapping values, that cannot be joined whenoverwriteisFalse. The function gets apandas.Serieswith overlapping values as input. E.g. set tolambda y: y.mean()to average the values or totupleto return them as a tupleaggregate_strategy (
str) – ifaggregate_functionis notNone,aggregate_strategydecides whenaggregate_functionis applied.'overlap': apply to all samples that have an overlapping index;'mismatch': apply to all samples that have an overlapping index and a different value
- Return type
- Returns
concatenated objects
- Raises
ValueError – if level and dtypes of object indices do not match
ValueError – if columns with the same name have different dtypes
ValueError – if
aggregate_strategyis not one of'overlap','mismatch'ValueError – if
aggregate_functionisNone,overwriteisFalse, and values in the same position do not match
Examples
>>> concat( ... [ ... pd.Series([0], index=pd.Index([0])), ... pd.Series([1], index=pd.Index([1])), ... ] ... ) 0 0 1 1 dtype: Int64 >>> concat( ... [ ... pd.Series([0], index=pd.Index([0]), name="col1"), ... pd.Series([1], index=pd.Index([0]), name="col2"), ... ] ... ) col1 col2 0 0 1 >>> concat( ... [ ... pd.Series([1, 1], index=pd.Index([0, 1])), ... pd.Series([1, 1], index=pd.Index([0, 1])), ... ], ... aggregate_function=np.sum, ... ) 0 1 1 1 dtype: Int64 >>> concat( ... [ ... pd.Series([1, 1], index=pd.Index([0, 1])), ... pd.Series([1, 2], index=pd.Index([0, 1])), ... ], ... aggregate_function=np.sum, ... ) 0 1 1 3 dtype: Int64 >>> concat( ... [ ... pd.Series([1, 1], index=pd.Index([0, 1])), ... pd.Series([1, 1], index=pd.Index([0, 1])), ... ], ... aggregate_function=np.sum, ... aggregate_strategy="overlap", ... ) 0 2 1 2 dtype: Int64 >>> concat( ... [ ... pd.Series( ... [0.0, 1.0], ... index=pd.Index( ... [0, 1], ... dtype="int", ... name="idx", ... ), ... name="float", ... ), ... pd.DataFrame( ... { ... "float": [np.nan, 2.0], ... "string": ["a", "b"], ... }, ... index=pd.MultiIndex.from_arrays( ... [[0, 2]], ... names=["idx"], ... ), ... ), ... ] ... ) float string idx 0 0.0 a 1 1.0 NaN 2 2.0 b >>> concat( ... [ ... pd.Series( ... [0.0, 1.0], ... index=filewise_index(["f1", "f2"]), ... name="float", ... ), ... pd.DataFrame( ... { ... "float": [1.0, 2.0], ... "string": ["a", "b"], ... }, ... index=segmented_index(["f2", "f3"]), ... ), ... ] ... ) float string file start end f1 0 days NaT 0.0 NaN f2 0 days NaT 1.0 a f3 0 days NaT 2.0 b >>> concat( ... [ ... pd.Series( ... [0.0, 0.0], ... index=filewise_index(["f1", "f2"]), ... name="float", ... ), ... pd.DataFrame( ... { ... "float": [1.0, 2.0], ... "string": ["a", "b"], ... }, ... index=segmented_index(["f2", "f3"]), ... ), ... ], ... overwrite=True, ... ) float string file start end f1 0 days NaT 0.0 NaN f2 0 days NaT 1.0 a f3 0 days NaT 2.0 b