stream()¶
- audb.stream(name, table, *, version=None, map=None, batch_size=16, shuffle=False, buffer_size=100000, only_metadata=False, bit_depth=None, channels=None, format=None, mixdown=False, sampling_rate=None, full_path=True, cache_root=None, num_workers=1, timeout=-1, verbose=True)[source]¶
Stream table and media files of a database.
Loads only the first
batch_size
rows of a table into memory, and downloads only the related media files, if any media files are requested.By setting
bit_depth
,channels
,format
,mixdown
, andsampling_rate
we can request a specific flavor of the database. In that case media files are automatically converted to the desired properties (see alsoaudb.Flavor
).- Parameters
name (
str
) – name of databasetable (
str
) – name of tablemap (
Optional
[dict
[str
,str
|Sequence
[str
]]]) – map scheme or scheme fields to column values. For example if your table holds a columnspeaker
with speaker IDs, which is assigned to a scheme that contains a dict mapping speaker IDs to age and gender entries,map={'speaker': ['age', 'gender']}
will replace the column with two new columns that map ID values to age and gender, respectively. To also keep the original column with speaker IDS, you can domap={'speaker': ['speaker', 'age', 'gender']}
batch_size (
int
) – number of table rows to return in one iterationshuffle (
bool
) – ifTrue
, it first readsbuffer_size
rows from the table and selectsbatch_size
randomly from thembuffer_size (
int
) – number of table rows to be loaded whenshuffle
isTrue
only_metadata (
bool
) – load only header and tables of databasechannels (
Union
[int
,Sequence
[int
],None
]) – channel selection, seeaudresample.remix()
. Note that media files with too few channels will be first upsampled by repeating the existing channels. E.g.channels=[0, 1]
upsamples all mono files to stereo, andchannels=[1]
returns the second channel of all multi-channel files and all mono filesmixdown (
bool
) – apply mono mix-downsampling_rate (
Optional
[int
]) – sampling rate in Hz, one of8000
,16000
,22050
,24000
,44100
,48000
full_path (
bool
) – replace relative with absolute file pathscache_root (
Optional
[str
]) – cache folder where databases are stored. If not setaudb.default_cache_root()
is usednum_workers (
Optional
[int
]) – number of parallel jobs or 1 for sequential processing. IfNone
will be set to the number of processors on the machine multiplied by 5timeout (
float
) – maximum wait time if another thread or process is already accessing the database. If timeout is reached,None
is returned. If timeout < 0 the method will block until the database can be accessedverbose (
bool
) – show debug messages
- Return type
- Returns
database object
- Raises
ValueError – if table is requested that is not part of the database
ValueError – if a non-supported
bit_depth
,format
, orsampling_rate
is requestedRuntimeError – if a flavor is requested, but the database contains media files, that don’t contain audio, e.g. text files
Examples
>>> import numpy as np >>> np.random.seed(1) >>> db = audb.stream( ... "emodb", ... "files", ... version="1.4.1", ... batch_size=4, ... shuffle=True, ... only_metadata=True, ... full_path=False, ... verbose=False, ... ) >>> next(db) duration speaker transcription file wav/14a05Fb.wav 0 days 00:00:03.128687500 14 a05 wav/15a05Eb.wav 0 days 00:00:03.993562500 15 a05 wav/12a05Nd.wav 0 days 00:00:03.185875 12 a05 wav/13a07Na.wav 0 days 00:00:01.911687500 13 a07