TACOTRON2_WAVERNN_PHONE_LJSPEECH¶

torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH¶

基於音素的文字轉語音（TTS）流水線，使用在 LJSpeech [Ito and Johnson, 2017] 上訓練了 1,500 個 epoch 的 Tacotron2，以及在 LJSpeech [Ito and Johnson, 2017] 8 位深度波形上訓練了 10,000 個 epoch 的 WaveRNN 聲碼器。

文字處理器基於音素對輸入文字進行編碼。它使用 DeepPhonemizer 將字素轉換為音素。該模型 (en_us_cmudict_forward) 在 CMUDict 上訓練。

您可以在此處找到 Tacotron2 的訓練指令碼。使用了以下引數：win_length=1100, hop_length=275, n_fft=2048, mel_fmin=40 和 mel_fmax=11025。

您可以在此處找到 WaveRNN 的訓練指令碼。

請參閱 torchaudio.pipelines.Tacotron2TTSBundle() 瞭解用法。

示例 - “Hello world! T T S stands for Text to Speech!”

示例 - “The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired,”

TACOTRON2_WAVERNN_PHONE_LJSPEECH¶

文件

教程

資源