Usage¶
audonnx
offers a simple interface
to load and use models in ONNX format.
Models with single or multiple input and output nodes are supported.
We begin with creating some test input - a file path, a signal array and an index in audformat.
import audiofile
file = './docs/_static/test.wav'
signal, sampling_rate = audiofile.read(
file,
always_2d=True,
)
index = pd.MultiIndex.from_arrays(
[
[file, file],
pd.to_timedelta(['0s', '3s']),
pd.to_timedelta(['3s', '5s']),
],
names=['file', 'start', 'end'],
)
Torch model¶
Create Torch model with a single input and output node.
import torch
class TorchModelSingle(torch.nn.Module):
def __init__(
self,
):
super().__init__()
self.hidden = torch.nn.Linear(18, 8)
self.out = torch.nn.Linear(8, 2)
def forward(self, x: torch.Tensor):
y = self.hidden(x.mean(dim=-1))
y = self.out(y)
return y.squeeze()
torch_model = TorchModelSingle()
Create OpenSMILE feature extractor to convert the raw audio signal to a sequence of low-level descriptors.
import opensmile
smile = opensmile.Smile(
feature_set=opensmile.FeatureSet.GeMAPSv01b,
feature_level=opensmile.FeatureLevel.LowLevelDescriptors,
)
Calculate features and run Torch model.
y = smile(signal, sampling_rate)
with torch.no_grad():
z = torch_model(torch.from_numpy(y))
z
tensor([ 29.3742, -214.7639])
Export model¶
To export the model to ONNX format, we pass some dummy input, which allows the function to figure out correct input and output shapes. Since the number of extracted feature frames varies with the length of the input signal, we tell the function that the last dimension of the input has a dynamic size. And we assign meaningful names to the nodes.
import audeer
import os
onnx_root = audeer.mkdir('onnx')
onnx_model_path = os.path.join(onnx_root, 'model.onnx')
dummy_input = torch.randn(y.shape[1:])
torch.onnx.export(
torch_model,
dummy_input,
onnx_model_path,
input_names=['feature'], # assign custom name to input node
output_names=['gender'], # assign custom name to output node
dynamic_axes={'feature': {1: 'time'}}, # dynamic size
opset_version=12,
)
From the exported model file
we now create an object of audonnx.Model
.
We pass the feature extractor,
so that the model can automatically convert the
input signal to the desired representation.
And we assign labels to the dimensions of the output node.
Printing the model provides a summary of
the input and output nodes.
import audonnx
onnx_model = audonnx.Model(
onnx_model_path,
labels=['female', 'male'],
transform=smile,
)
onnx_model
Input:
feature:
shape: [18, -1]
dtype: tensor(float)
transform: opensmile.core.smile.Smile
Output:
gender:
shape: [2]
dtype: tensor(float)
labels: [female, male]
Get information for individual nodes.
onnx_model.inputs['feature']
{shape: [18, -1], dtype: tensor(float), transform: opensmile.core.smile.Smile}
print(onnx_model.inputs['feature'].transform)
$opensmile.core.smile.Smile:
feature_set: GeMAPSv01b
feature_level: LowLevelDescriptors
options: {}
sampling_rate: null
channels:
- 0
mixdown: false
resample: false
onnx_model.outputs['gender']
{shape: [2], dtype: tensor(float), labels: [female, male]}
onnx_model.outputs['gender'].labels
['female', 'male']
Check that the exported model gives the expected output.
onnx_model(signal, sampling_rate)
array([ 29.374172, -214.76393 ], dtype=float32)
Create interface¶
onnx.Model
does not come with a fancy interface itself,
but we can use audinterface to create one.
import numpy as np
import audinterface
interface = audinterface.Feature(
feature_names=onnx_model.outputs['gender'].labels,
process_func=onnx_model,
)
interface.process_index(index)
female | male | |||
---|---|---|---|---|
file | start | end | ||
./docs/_static/test.wav | 0 days 00:00:00 | 0 days 00:00:03 | 30.218712 | -214.255234 |
0 days 00:00:03 | 0 days 00:00:05 | 28.661716 | -211.555435 |
Or if we are only interested in the majority class.
interface.process_index(index).idxmax(axis=1)
file | start | end | |
---|---|---|---|
./docs/_static/test.wav | 0 days 00:00:00 | 0 days 00:00:03 | female |
0 days 00:00:03 | 0 days 00:00:05 | female |
Save and load¶
Save the model to a YAML file.
onnx_meta_path = os.path.join(onnx_root, 'model.yaml')
onnx_model.to_yaml(onnx_meta_path)
$audonnx.core.model.Model==0.7.0:
path: model.onnx
labels:
- female
- male
transform:
$opensmile.core.smile.Smile==2.5.0:
feature_set: GeMAPSv01b
feature_level: LowLevelDescriptors
options: {}
sampling_rate: null
channels:
- 0
mixdown: false
resample: false
Load the model from a YAML file.
import audobject
onnx_model_2 = audobject.from_yaml(onnx_meta_path)
onnx_model_2(signal, sampling_rate)
array([ 29.374172, -214.76393 ], dtype=float32)
Or shorter:
onnx_model_3 = audonnx.load(onnx_root)
onnx_model_3(signal, sampling_rate)
array([ 29.374172, -214.76393 ], dtype=float32)
Quantize weights¶
To reduce the memory print of a model,
we can quantize it,
compare the MobilenetV2 example.
For instance, we can store model weights as 8 bit integers.
For quantization make sure
you have installed
onnx
as well as
onnxruntime
.
import onnxruntime.quantization
onnx_infer_path = os.path.join(onnx_root, 'model_infer.onnx')
onnxruntime.quantization.quant_pre_process(
onnx_model_path,
onnx_infer_path,
)
onnx_quant_path = os.path.join(onnx_root, 'model_quant.onnx')
onnxruntime.quantization.quantize_dynamic(
onnx_infer_path,
onnx_quant_path,
weight_type=onnxruntime.quantization.QuantType.QUInt8,
)
The output of the quantized model differs slightly.
onnx_model_4 = audonnx.Model(
onnx_quant_path,
labels=['female', 'male'],
transform=smile,
)
onnx_model_4(signal, sampling_rate)
array([ 29.231592, -212.91867 ], dtype=float32)
Custom transform¶
So far,
we have used
opensmile.Smile
as feature extractor.
It derives from
audobject.Object
and is therefore serializable by default.
However,
using
audonnx.Function
we can turn any function
into a serializable object.
For instance,
we can define a function that extracts
Mel-frequency cepstral coefficients (MFCCs)
with librosa.
def mfcc(x, sr):
import librosa # import here to make function self-contained
y = librosa.feature.mfcc(
y=x.squeeze(),
sr=sr,
n_mfcc=18,
)
return y.reshape(1, 18, -1)
As long as the function is self-contained (i.e. does not depend on external variables or imports) we can turn it into a serializable object.
transform = audonnx.Function(mfcc)
print(transform)
$audonnx.core.function.Function:
func: "def mfcc(x, sr):\n import librosa # import here to make function self-contained\n\
\ y = librosa.feature.mfcc(\n y=x.squeeze(),\n sr=sr,\n \
\ n_mfcc=18,\n )\n return y.reshape(1, 18, -1)\n"
func_args: {}
And use it to initialize our model.
onnx_model_5 = audonnx.Model(
onnx_model_path,
labels=['female', 'male'],
transform=transform,
)
onnx_model_5
Input:
feature:
shape: [18, -1]
dtype: tensor(float)
transform: audonnx.core.function.Function(mfcc)
Output:
gender:
shape: [2]
dtype: tensor(float)
labels: [female, male]
Then we can save and load the model as before.
onnx_model_5.to_yaml(onnx_meta_path)
onnx_model_6 = audonnx.load(onnx_root)
onnx_model_6(signal, sampling_rate)
array([-32.218803, 47.5395 ], dtype=float32)
Multiple nodes¶
Define a model that takes as input the raw audio in addition to the features and provides two more output nodes - the output from the hidden layer and a confidence value.
class TorchModelMulti(torch.nn.Module):
def __init__(
self,
):
super().__init__()
self.hidden_left = torch.nn.Linear(1, 4)
self.hidden_right = torch.nn.Linear(18, 4)
self.out = torch.nn.ModuleDict(
{
'gender': torch.nn.Linear(8, 2),
'confidence': torch.nn.Linear(8, 1),
}
)
def forward(self, signal: torch.Tensor, feature: torch.Tensor):
y_left = self.hidden_left(signal.mean(dim=-1))
y_right = self.hidden_right(feature.mean(dim=-1))
y_hidden = torch.cat([y_left, y_right], dim=-1)
y_gender = self.out['gender'](y_hidden)
y_confidence = self.out['confidence'](y_hidden)
return (
y_hidden.squeeze(),
y_gender.squeeze(),
y_confidence,
)
Export the new model to ONNX format and load it. Note that we do not assign labels to all output nodes. In that case, they are automatically created from the name of the output node. And since the first node expects the raw audio signal, we do not set a transform for it.
onnx_multi_path = os.path.join(onnx_root, 'model.onnx')
torch.onnx.export(
TorchModelMulti(),
(
torch.randn(signal.shape),
torch.randn(y.shape[1:]),
),
onnx_multi_path,
input_names=['signal', 'feature'],
output_names=['hidden', 'gender', 'confidence'],
dynamic_axes={
'signal': {1: 'time'},
'feature': {1: 'time'},
},
opset_version=12,
)
onnx_model_7 = audonnx.Model(
onnx_multi_path,
labels={
'gender': ['female', 'male']
},
transform={
'feature': smile,
},
)
onnx_model_7
Input:
signal:
shape: [1, -1]
dtype: tensor(float)
transform: None
feature:
shape: [18, -1]
dtype: tensor(float)
transform: opensmile.core.smile.Smile
Output:
hidden:
shape: [8]
dtype: tensor(float)
labels: [hidden-0, hidden-1, hidden-2, (...), hidden-5, hidden-6, hidden-7]
gender:
shape: [2]
dtype: tensor(float)
labels: [female, male]
confidence:
shape: [1]
dtype: tensor(float)
labels: [confidence]
By default, returns a dictionary with output for every node.
onnx_model_7(signal, sampling_rate)
{'hidden': array([ 7.6027595e-02, -8.8102180e-01, -4.5986521e-01, -2.4757692e-01,
-3.5762204e+02, -6.2052740e+02, -6.6322235e+02, -2.5100070e+02],
dtype=float32),
'gender': array([-85.367584, 330.33902 ], dtype=float32),
'confidence': array([-17.792288], dtype=float32)}
To request a specific node use the outputs
argument.
onnx_model_7(
signal,
sampling_rate,
outputs='gender',
)
array([-85.367584, 330.33902 ], dtype=float32)
Or provide a list of names to request several outputs.
onnx_model_7(
signal,
sampling_rate,
outputs=['gender', 'confidence'],
)
{'gender': array([-85.367584, 330.33902 ], dtype=float32),
'confidence': array([-17.792288], dtype=float32)}
To concatenate the outputs to a single array, do:
onnx_model_7(
signal,
sampling_rate,
outputs=['gender', 'confidence'],
concat=True,
)
array([-85.367584, 330.33902 , -17.792288], dtype=float32)
Create interface and process a file.
outputs = ['gender', 'confidence']
interface = audinterface.Feature(
feature_names=onnx_model_7.labels(outputs),
process_func=onnx_model_7,
process_func_args={
'outputs': outputs,
'concat': True,
},
)
interface.process_file(file)
female | male | confidence | |||
---|---|---|---|---|---|
file | start | end | |||
./docs/_static/test.wav | 0 days | 0 days 00:00:05.247687500 | -85.367584 | 330.33902 | -17.792288 |
Run on the GPU¶
To run a model on the GPU install onnxruntime-gpu
.
Note that the version has to fit the CUDA installation.
We can get the information from this table.
Then select CUDA device when loading the model:
import os
import audonnx
model = audonnx.load(..., device='cuda:2')
With
onnxruntime-gpu<1.8
it is not possible to directly specify an ID.
In that case do:
os.environ['CUDA_VISIBLE_DEVICES'] = '2'
model = audonnx.load(..., device='cuda')