concat()¶
- audformat.utils.concat(objs, *, overwrite=False, aggregate_function=None, aggregate_strategy='mismatch')[source]¶
Concatenate objects.
If all objects are conform to table specifications and at least one object is segmented, the output has a segmented index. Otherwise, requires that levels and dtypes of all objects match, see
audformat.utils.is_index_alike()
. When apandas.Index
is concatenated with a single-levelpandas.MultiIndex
, the result is apandas.Index
.The new object contains index and columns of all objects. Missing values will be set to
NaN
.Columns with the same identifier are combined to a single column. This requires that both columns have the same dtype and if
overwrite
is set toFalse
, values in places where the indices overlap have to match or one column containsNaN
. Ifoverwrite
is set toTrue
, the value of the last object in the list is kept. Ifoverwrite
is set toFalse
, a custom aggregation function can be provided withaggregate_function
that converts the overlapping values into a single value.- Parameters
overwrite (
bool
) – overwrite values where indices overlapaggregate_function (
Optional
[Callable
[[Series
],Any
]]) – function to aggregate overlapping values, that cannot be joined whenoverwrite
isFalse
. The function gets apandas.Series
with overlapping values as input. E.g. set tolambda y: y.mean()
to average the values or totuple
to return them as a tupleaggregate_strategy (
str
) – ifaggregate_function
is notNone
,aggregate_strategy
decides whenaggregate_function
is applied.'overlap'
: apply to all samples that have an overlapping index;'mismatch'
: apply to all samples that have an overlapping index and a different value
- Return type
- Returns
concatenated objects
- Raises
ValueError – if level and dtypes of object indices do not match
ValueError – if columns with the same name have different dtypes
ValueError – if
aggregate_strategy
is not one of'overlap'
,'mismatch'
ValueError – if
aggregate_function
isNone
,overwrite
isFalse
, and values in the same position do not match
Examples
>>> concat( ... [ ... pd.Series([0], index=pd.Index([0])), ... pd.Series([1], index=pd.Index([1])), ... ] ... ) 0 0 1 1 dtype: Int64 >>> concat( ... [ ... pd.Series([0], index=pd.Index([0]), name="col1"), ... pd.Series([1], index=pd.Index([0]), name="col2"), ... ] ... ) col1 col2 0 0 1 >>> concat( ... [ ... pd.Series([1, 1], index=pd.Index([0, 1])), ... pd.Series([1, 1], index=pd.Index([0, 1])), ... ], ... aggregate_function=np.sum, ... ) 0 1 1 1 dtype: Int64 >>> concat( ... [ ... pd.Series([1, 1], index=pd.Index([0, 1])), ... pd.Series([1, 2], index=pd.Index([0, 1])), ... ], ... aggregate_function=np.sum, ... ) 0 1 1 3 dtype: Int64 >>> concat( ... [ ... pd.Series([1, 1], index=pd.Index([0, 1])), ... pd.Series([1, 1], index=pd.Index([0, 1])), ... ], ... aggregate_function=np.sum, ... aggregate_strategy="overlap", ... ) 0 2 1 2 dtype: Int64 >>> concat( ... [ ... pd.Series( ... [0.0, 1.0], ... index=pd.Index( ... [0, 1], ... dtype="int", ... name="idx", ... ), ... name="float", ... ), ... pd.DataFrame( ... { ... "float": [np.nan, 2.0], ... "string": ["a", "b"], ... }, ... index=pd.MultiIndex.from_arrays( ... [[0, 2]], ... names=["idx"], ... ), ... ), ... ] ... ) float string idx 0 0.0 a 1 1.0 NaN 2 2.0 b >>> concat( ... [ ... pd.Series( ... [0.0, 1.0], ... index=filewise_index(["f1", "f2"]), ... name="float", ... ), ... pd.DataFrame( ... { ... "float": [1.0, 2.0], ... "string": ["a", "b"], ... }, ... index=segmented_index(["f2", "f3"]), ... ), ... ] ... ) float string file start end f1 0 days NaT 0.0 NaN f2 0 days NaT 1.0 a f3 0 days NaT 2.0 b >>> concat( ... [ ... pd.Series( ... [0.0, 0.0], ... index=filewise_index(["f1", "f2"]), ... name="float", ... ), ... pd.DataFrame( ... { ... "float": [1.0, 2.0], ... "string": ["a", "b"], ... }, ... index=segmented_index(["f2", "f3"]), ... ), ... ], ... overwrite=True, ... ) float string file start end f1 0 days NaT 0.0 NaN f2 0 days NaT 1.0 a f3 0 days NaT 2.0 b