cRnnVad2

Description

BLSTM RNN processor.

Type hierarchy

cDataProcessorcRnnVad2

Fields

  • voiceIdx (numeric) [default: 0.0]

    The index of the field which contains the 'voice' class output activation. (0 is the first field)

  • agentIdx (numeric) [default: 1.0]

    The index of the field which contains the 'agent/alien' class output activation. (0 is the first field)

  • voiceThresh (numeric) [default: 0.4]

    The threshold to apply to the 'voice' output activation.

  • agentThresh (numeric) [default: 0.3]

    The threshold to apply to the 'agent' output activation.

  • energyIdx (numeric) [default: 2.0]

    The index of the field which contains the energy/loudness/intensity/etc. value (set to -1 to disable)

  • f0Idx (numeric) [default: 3.0]

    Index of F0 input field (set to -1 to disable)

  • agentTurnPastBlock (numeric) [default: 20.0]

    time the VAD will be blocked after receiving an agent speech end message (in frames, usually 100fps) (use 20 for the SEMAINE speech2speech system, and 60 for the speech2face system).

  • alwaysRejectAgent (numeric) [default: 0.0]

    1 = never detect a speaker turn while the agent is speaking

  • smartRejectAgent (numeric) [default: 1.0]

    1 = apply different VAD strategy while agent is speaking

  • userEavgHold (numeric) [default: 500.0]

    Hold time for user energy envelope and average computation (10ms frames as unit).

  • userEavgDecay (numeric) [default: 500.0]

    Decay (linear) time for user energy envelope and average computation (10ms frames as unit).

  • agentEavgHold (numeric) [default: 200.0]

    Hold time for user energy envelope and average computation (10ms frames as unit).

  • agentEavgDecay (numeric) [default: 200.0]

    Decay (linear) time for user energy envelope and average computation (10ms frames as unit).

  • vadDebug (numeric) [default: 0.0]

    1 = output energy and VAD statistics for debugging (set to 2 to always force vad output value to 0 while debugging).

  • allowEoverride (numeric) [default: 1.0]

    1 = allow VAD output even if LSTM does not detect voice when the energy is in the range of the user's current energy envelope (NOTE: this reduces noise robustness, e.g. when moving a headset etc.)