concat()

audformat.utils.concat(objs, *, overwrite=False, aggregate_function=None, aggregate_strategy='mismatch')[source]

Concatenate objects.

If all objects are conform to table specifications and at least one object is segmented, the output has a segmented index. Otherwise, requires that levels and dtypes of all objects match, see audformat.utils.is_index_alike(). When a pandas.Index is concatenated with a single-level pandas.MultiIndex, the result is a pandas.Index.

The new object contains index and columns of all objects. Missing values will be set to NaN.

Columns with the same identifier are combined to a single column. This requires that both columns have the same dtype and if overwrite is set to False, values in places where the indices overlap have to match or one column contains NaN. If overwrite is set to True, the value of the last object in the list is kept. If overwrite is set to False, a custom aggregation function can be provided with aggregate_function that converts the overlapping values into a single value.

Parameters
  • objs (Sequence[Union[Series, DataFrame]]) – objects

  • overwrite (bool) – overwrite values where indices overlap

  • aggregate_function (Optional[Callable[[Series], Any]]) – function to aggregate overlapping values, that cannot be joined when overwrite is False. The function gets a pandas.Series with overlapping values as input. E.g. set to lambda y: y.mean() to average the values or to tuple to return them as a tuple

  • aggregate_strategy (str) – if aggregate_function is not None, aggregate_strategy decides when aggregate_function is applied. 'overlap': apply to all samples that have an overlapping index; 'mismatch': apply to all samples that have an overlapping index and a different value

Return type

Union[Series, DataFrame]

Returns

concatenated objects

Raises
  • ValueError – if level and dtypes of object indices do not match

  • ValueError – if columns with the same name have different dtypes

  • ValueError – if aggregate_strategy is not one of 'overlap', 'mismatch'

  • ValueError – if aggregate_function is None, overwrite is False, and values in the same position do not match

Examples

>>> concat(
...     [
...         pd.Series([0], index=pd.Index([0])),
...         pd.Series([1], index=pd.Index([1])),
...     ]
... )
0    0
1    1
dtype: Int64
>>> concat(
...     [
...         pd.Series([0], index=pd.Index([0]), name="col1"),
...         pd.Series([1], index=pd.Index([0]), name="col2"),
...     ]
... )
   col1  col2
0     0     1
>>> concat(
...     [
...         pd.Series([1, 1], index=pd.Index([0, 1])),
...         pd.Series([1, 1], index=pd.Index([0, 1])),
...     ],
...     aggregate_function=np.sum,
... )
0    1
1    1
dtype: Int64
>>> concat(
...     [
...         pd.Series([1, 1], index=pd.Index([0, 1])),
...         pd.Series([1, 2], index=pd.Index([0, 1])),
...     ],
...     aggregate_function=np.sum,
... )
0    1
1    3
dtype: Int64
>>> concat(
...     [
...         pd.Series([1, 1], index=pd.Index([0, 1])),
...         pd.Series([1, 1], index=pd.Index([0, 1])),
...     ],
...     aggregate_function=np.sum,
...     aggregate_strategy="overlap",
... )
0    2
1    2
dtype: Int64
>>> concat(
...     [
...         pd.Series(
...             [0.0, 1.0],
...             index=pd.Index(
...                 [0, 1],
...                 dtype="int",
...                 name="idx",
...             ),
...             name="float",
...         ),
...         pd.DataFrame(
...             {
...                 "float": [np.nan, 2.0],
...                 "string": ["a", "b"],
...             },
...             index=pd.MultiIndex.from_arrays(
...                 [[0, 2]],
...                 names=["idx"],
...             ),
...         ),
...     ]
... )
     float string
idx
0      0.0      a
1      1.0    NaN
2      2.0      b
>>> concat(
...     [
...         pd.Series(
...             [0.0, 1.0],
...             index=filewise_index(["f1", "f2"]),
...             name="float",
...         ),
...         pd.DataFrame(
...             {
...                 "float": [1.0, 2.0],
...                 "string": ["a", "b"],
...             },
...             index=segmented_index(["f2", "f3"]),
...         ),
...     ]
... )
                 float string
file start  end
f1   0 days NaT    0.0    NaN
f2   0 days NaT    1.0      a
f3   0 days NaT    2.0      b
>>> concat(
...     [
...         pd.Series(
...             [0.0, 0.0],
...             index=filewise_index(["f1", "f2"]),
...             name="float",
...         ),
...         pd.DataFrame(
...             {
...                 "float": [1.0, 2.0],
...                 "string": ["a", "b"],
...             },
...             index=segmented_index(["f2", "f3"]),
...         ),
...     ],
...     overwrite=True,
... )
                 float string
file start  end
f1   0 days NaT    0.0    NaN
f2   0 days NaT    1.0      a
f3   0 days NaT    2.0      b