CTCDecoder¶

class torchaudio.models.decoder.CTCDecoder[source]¶

來自 Flashlight 的 CTC beam search 解碼器 [Kahn 等, 2022]。

注意

要構建解碼器，請使用工廠函式 ctc_decoder()。

使用 CTCDecoder 的教程: 使用 CTC 解碼器進行 ASR 推理

使用 CTC 解碼器進行 ASR 推理

方法¶

call¶

CTCDecoder.__call__(emissions: FloatTensor, lengths: Optional[Tensor] = None) → List[List[CTCHypothesis]][source]¶

執行批次離線解碼。

注意

此方法一次性執行離線解碼。要執行增量解碼，請參考 decode_step()。

引數：

emissions (torch.FloatTensor) – 形狀為 (batch, frame, num_tokens) 的 CPU 張量，儲存標籤機率分佈序列；聲學模型的輸出。
lengths (Tensor 或 None, 可選) – 形狀為 (batch, ) 的 CPU 張量，儲存每個批次中輸出張量在時間軸上的有效長度。

返回：

批次中每個音訊序列的排序後的最佳假設列表。

返回型別：

List[List[CTCHypothesis]]

decode_begin¶

CTCDecoder.decode_begin()[source]¶

初始化解碼器的內部狀態。

用法請參考 decode_step()。

注意

此方法僅在執行線上解碼時需要。使用 __call__() 執行批次解碼時無需此方法。

decode_end¶

CTCDecoder.decode_end()[source]¶

終結解碼器的內部狀態。

用法請參考 decode_step()。

注意

此方法僅在執行線上解碼時需要。使用 __call__() 執行批次解碼時無需此方法。

decode_step¶

CTCDecoder.decode_step(emissions: FloatTensor)[source]¶

在當前內部狀態的基礎上執行增量解碼。

注意

此方法僅在執行線上解碼時需要。使用 __call__() 執行批次解碼時無需此方法。

引數：: emissions (torch.FloatTensor) – 形狀為 (frame, num_tokens) 的 CPU 張量，儲存標籤機率分佈序列；聲學模型的輸出。

示例

>>> decoder = torchaudio.models.decoder.ctc_decoder(...)
>>> decoder.decode_begin()
>>> decoder.decode_step(emission1)
>>> decoder.decode_step(emission2)
>>> decoder.decode_end()
>>> result = decoder.get_final_hypothesis()

get_final_hypothesis¶

CTCDecoder.get_final_hypothesis() → List[CTCHypothesis][source]¶

獲取最終假設

返回：: 排序後的最佳假設列表。
返回型別：: List[CTCHypothesis]

注意

此方法僅在執行線上解碼時需要。使用 __call__() 執行批次解碼時無需此方法。

idxs_to_tokens¶

CTCDecoder.idxs_to_tokens(idxs: LongTensor) → List[source]¶

將原始 token ID 對映到相應的 token

引數：: idxs (LongTensor) – 從解碼器生成的原始 token ID
返回：: 與輸入 ID 對應的 token
返回型別：: 列表

支援結構¶

CTCHypothesis¶

class torchaudio.models.decoder.CTCHypothesis(tokens: torch.LongTensor, words: List[str], score: float, timesteps: torch.IntTensor)[source]¶

表示由 CTC beam search 解碼器 CTCDecoder 生成的假設。

使用 CTCHypothesis 的教程: 使用 CTC 解碼器進行 ASR 推理

使用 CTC 解碼器進行 ASR 推理

tokens: LongTensor¶: 預測的 token ID 序列。形狀為 (L, )，其中 L 是輸出序列的長度

words: List[str]¶: 預測的單詞列表。

注意

此屬性僅在為解碼器提供了詞典時適用。如果在沒有詞典的情況下進行解碼，此屬性將為空。請參考 tokens 和 idxs_to_tokens()。

score: float¶: 與假設對應的得分

timesteps: IntTensor¶: 與 token 對應的時間步。形狀為 (L, )，其中 L 是輸出序列的長度

CTCDecoderLM¶

class torchaudio.models.decoder.CTCDecoderLM[source]¶

用於建立自定義語言模型以與解碼器一起使用的語言模型基類。

使用 CTCDecoderLM 的教程: 使用 CTC 解碼器進行 ASR 推理

使用 CTC 解碼器進行 ASR 推理

abstract start(start_with_nothing: bool) → CTCDecoderLMState[source]¶

初始化或重置語言模型。

引數：: start_with_nothing (bool) – 是否以 sil token 開始句子。
返回：: 起始狀態
返回型別：: CTCDecoderLMState

abstract score(state: CTCDecoderLMState, usr_token_idx: int) → Tuple[CTCDecoderLMState, float][source]¶

根據當前 LM 狀態和新單詞評估語言模型。

引數：

state (CTCDecoderLMState) – 當前 LM 狀態
usr_token_idx (int) – 單詞的索引

返回：

(CTCDecoderLMState, float)

CTCDecoderLMState: 新的 LM 狀態
float: 得分

abstract finish(state: CTCDecoderLMState) → Tuple[CTCDecoderLMState, float][source]¶

根據當前 LM 狀態評估語言模型的結束。

引數：

state (CTCDecoderLMState) – 當前 LM 狀態

返回：

(CTCDecoderLMState, float)

CTCDecoderLMState: 新的 LM 狀態
float: 得分

CTCDecoderLMState¶

class torchaudio.models.decoder.CTCDecoderLMState[source]¶

語言模型狀態。

使用 CTCDecoderLMState 的教程: 使用 CTC 解碼器進行 ASR 推理

使用 CTC 解碼器進行 ASR 推理

property children: Dict[int, CTCDecoderLMState]¶: 索引到 LM 狀態的對映

child(usr_index: int) → CTCDecoderLMState[source]¶

返回與 usr_index 對應的子狀態，如果未找到輸入索引，則建立並返回一個新狀態。

引數：: usr_index (int) – 與子狀態對應的索引
返回：: 與 usr_index 對應的子狀態
返回型別：: CTCDecoderLMState

compare(state: CTCDecoderLMState) → CTCDecoderLMState[source]¶

比較兩個語言模型狀態。

引數：: state (CTCDecoderLMState) – 用於比較的 LM 狀態
返回：: 如果狀態相同返回 0，如果 self 小於 state 返回 -1，如果 self 大於 state 返回 +1。
返回型別：: int

CTCDecoder¶

方法¶

call¶

decode_begin¶

decode_end¶

decode_step¶

get_final_hypothesis¶

idxs_to_tokens¶

支援結構¶

CTCHypothesis¶

CTCDecoderLM¶

CTCDecoderLMState¶

文件

教程

資源

CTCDecoder¶

方法¶

__call__¶

decode_begin¶

decode_end¶

decode_step¶

get_final_hypothesis¶

idxs_to_tokens¶

支援結構¶

CTCHypothesis¶

CTCDecoderLM¶

CTCDecoderLMState¶

文件

教程

資源

call¶