ctc_decoder¶

torchaudio.models.decoder.ctc_decoder(lexicon: Optional[str], tokens: Union[str, List[str]], lm: Optional[Union[str, CTCDecoderLM]] = None, lm_dict: Optional[str] = None, nbest: int = 1, beam_size: int = 50, beam_size_token: Optional[int] = None, beam_threshold: float = 50, lm_weight: float = 2, word_score: float = 0, unk_score: float = -inf, sil_score: float = 0, log_add: bool = False, blank_token: str = '-', sil_token: str = '|', unk_word: str = '<unk>') → CTCDecoder[原始碼]¶

構建 CTCDecoder 的例項。

引數：

lexicon (str 或 None) – 包含可能單詞及其對應拼寫的詞典檔案。每行包含一個單詞及其由空格分隔的拼寫。如果為 None，則使用無詞典解碼。
tokens (str 或 List[str]) – 包含有效標記（tokens）的檔案或列表。如果使用檔案，預期格式是將對映到同一索引的標記放在同一行上
lm (str, CTCDecoderLM, 或 None, 可選) – KenLM 語言模型的路徑，或 CTCDecoderLM 型別的自定義語言模型，如果不需要語言模型則為 None
lm_dict (str 或 None, 可選) – 包含用於 LM 的詞典檔案，每行一個單詞，按 LM 索引排序。如果使用詞典進行解碼，lm_dict 中的條目也必須出現在詞典檔案中。如果為 None，則使用詞典檔案構建 LM 的詞典。(預設值: None)
nbest (int, 可選) – 返回的最佳解碼結果數量 (預設值: 1)
beam_size (int, 可選) – 每個解碼步驟後保留的最大假設數量 (預設值: 50)
beam_size_token (int, 可選) – 每個解碼步驟考慮的最大標記數量。如果為 None，則設定為標記總數 (預設值: None)
beam_threshold (float, 可選) – 修剪假設的閾值 (預設值: 50)
lm_weight (float, 可選) – 語言模型權重 (預設值: 2)
word_score (float, 可選) – 單詞插入分數 (預設值: 0)
unk_score (float, 可選) – 未知單詞插入分數 (預設值: -inf)
sil_score (float, 可選) – 靜音插入分數 (預設值: 0)
log_add (bool, 可選) – 合併假設時是否使用 logadd (預設值: False)
blank_token (str, 可選) – 對應於空白符的標記 (預設值: “-“)
sil_token (str, 可選) – 對應於靜音符的標記 (預設值: “|”)
unk_word (str, 可選) – 對應於未知詞的單詞 (預設值: “<unk>”)

返回值：

解碼器

返回型別：

CTCDecoder

示例

>>> decoder = ctc_decoder(
>>>     lexicon="lexicon.txt",
>>>     tokens="tokens.txt",
>>>     lm="kenlm.bin",
>>> )
>>> results = decoder(emissions) # List of shape (B, nbest) of Hypotheses

使用 ctc_decoder 的教程

使用 CTC 解碼器進行 ASR 推理

ctc_decoder¶

文件

教程

資源