torchaudio.prototype.models.conformer_rnnt_model¶
- torchaudio.prototype.models.conformer_rnnt_model(*, input_dim: int, encoding_dim: int, time_reduction_stride: int, conformer_input_dim: int, conformer_ffn_dim: int, conformer_num_layers: int, conformer_num_heads: int, conformer_depthwise_conv_kernel_size: int, conformer_dropout: float, num_symbols: int, symbol_embedding_dim: int, num_lstm_layers: int, lstm_hidden_dim: int, lstm_layer_norm: bool, lstm_layer_norm_epsilon: float, lstm_dropout: float, joiner_activation: str) RNNT[原始碼]¶
構建基於 Conformer 的迴圈神經網路 transducer (RNN-T) 模型。
- 引數:
input_dim (int) – 傳遞給轉錄網路的輸入序列幀的維度。
encoding_dim (int) – 傳遞給聯合網路的由轉錄網路和預測網路生成的編碼維度。
time_reduction_stride (int) – 減小輸入序列長度的因子。
conformer_input_dim (int) – Conformer 輸入的維度。
conformer_ffn_dim (int) – 每個 Conformer 層的全連線網路隱藏層維度。
conformer_num_layers (int) – 要例項化的 Conformer 層數。
conformer_num_heads (int) – 每個 Conformer 層中的注意力頭數量。
conformer_depthwise_conv_kernel_size (int) – 每個 Conformer 層的深度可分離卷積層的核大小。
conformer_dropout (float) – Conformer dropout 機率。
num_symbols (int) – 目標 token 集合的基數。
symbol_embedding_dim (int) – 每個目標 token 嵌入的維度。
num_lstm_layers (int) – 要例項化的 LSTM 層數。
lstm_hidden_dim (int) – 每個 LSTM 層的輸出維度。
lstm_layer_norm (bool) – 如果為
True,則為 LSTM 層啟用層歸一化。lstm_layer_norm_epsilon (float) – 在 LSTM 層歸一化層中使用的 epsilon 值。
lstm_dropout (float) – LSTM dropout 機率。
joiner_activation (str) – 聯合器中使用的啟用函式。必須是 (“relu”, “tanh”) 之一。(預設: “relu”)
返回:
- RNNT
Conformer RNN-T 模型。