torchaudio.compliance.kaldi.mfcc¶

torchaudio.compliance.kaldi.mfcc(waveform: Tensor, blackman_coeff: float = 0.42, cepstral_lifter: float = 22.0, channel: int = -1, dither: float = 0.0, energy_floor: float = 1.0, frame_length: float = 25.0, frame_shift: float = 10.0, high_freq: float = 0.0, htk_compat: bool = False, low_freq: float = 20.0, num_ceps: int = 13, min_duration: float = 0.0, num_mel_bins: int = 23, preemphasis_coefficient: float = 0.97, raw_energy: bool = True, remove_dc_offset: bool = True, round_to_power_of_two: bool = True, sample_frequency: float = 16000.0, snip_edges: bool = True, subtract_mean: bool = False, use_energy: bool = False, vtln_high: float = -500.0, vtln_low: float = 100.0, vtln_warp: float = 1.0, window_type: str = 'povey') → Tensor[原始碼]¶

從原始音訊訊號建立 MFCC。這與 Kaldi 的 compute-mfcc-feats 的輸入/輸出相匹配。

引數:

waveform (Tensor) – 大小為 (c, n) 的音訊張量，其中 c 在 [0,2) 範圍內
blackman_coeff (float, 可選) – 廣義 Blackman 視窗的常數係數。(預設值: 0.42)
cepstral_lifter (float, 可選) – 控制 MFCC 縮放的常數 (預設值: 22.0)
channel (int, 可選) – 要提取的通道 (-1 -> 期望單聲道, 0 -> 左聲道, 1 -> 右聲道) (預設值: -1)
dither (float, 可選) – 抖動常數 (0.0 表示無抖動)。如果停用此選項，應設定 energy_floor 選項，例如設定為 1.0 或 0.1 (預設值: 0.0)
energy_floor (float, 可選) – 頻譜圖計算中能量（絕對值，非相對值）的下限。注意：此下限應用於表示總訊號能量的第零分量。單個頻譜圖元素的下限固定為 std::numeric_limits<float>::epsilon()。(預設值: 1.0)
frame_length (float, 可選) – 幀長，單位為毫秒 (預設值: 25.0)
frame_shift (float, 可選) – 幀移，單位為毫秒 (預設值: 10.0)
high_freq (float, 可選) – 梅爾頻段的高截止頻率 (如果 <= 0，則從奈奎斯特頻率偏移) (預設值: 0.0)
htk_compat (bool, 可選) – 如果為 True，則將能量放在最後。警告：這不足以獲得 HTK 相容的特徵 (需要更改其他引數)。(預設值: False)
low_freq (float, 可選) – 梅爾頻段的低截止頻率 (預設值: 20.0)
num_ceps (int, 可選) – MFCC 計算中的倒譜系數數量 (包括 C0) (預設值: 13)
min_duration (float, 可選) – 要處理的片段的最小持續時間 (單位為秒)。(預設值: 0.0)
num_mel_bins (int, 可選) – 三角形梅爾頻率 bin 的數量 (預設值: 23)
preemphasis_coefficient (float, 可選) – 用於訊號預加重的係數 (預設值: 0.97)
raw_energy (bool, 可選) – 如果為 True，則在預加重和加窗之前計算能量 (預設值: True)
remove_dc_offset (bool, 可選) – 從每幀的波形中減去均值 (預設值: True)
round_to_power_of_two (bool, 可選) – 如果為 True，則透過對 FFT 輸入進行零填充，將視窗大小舍入到最接近的二次冪。(預設值: True)
sample_frequency (float, 可選) – 波形資料的取樣頻率 (如果波形檔案中指定了頻率，則必須匹配) (預設值: 16000.0)
snip_edges (bool, 可選) – 如果為 True，則透過僅輸出完全適合檔案中的幀來處理末端效應，並且幀的數量取決於 frame_length。如果為 False，則幀的數量僅取決於 frame_shift，並且我們在末端反射資料。(預設值: True)
subtract_mean (bool, 可選) – 減去每個特徵檔案的均值 [CMS]；不建議以這種方式進行。(預設值: False)
use_energy (bool, 可選) – 在 FBANK 輸出中新增一個包含能量的額外維度。(預設值: False)
vtln_high (float, 可選) – 分段線性 VTLN 變形函式中的高拐點 (如果為負，則從高梅爾頻率偏移) (預設值: -500.0)
vtln_low (float, 可選) – 分段線性 VTLN 變形函式中的低拐點 (預設值: 100.0)
vtln_warp (float, 可選) – Vtln 變形因子 (僅在未指定 vtln_map 時適用) (預設值: 1.0)
window_type (str, 可選) – 視窗型別 (‘hamming’|’hanning’|’povey’|’rectangular’|’blackman’) (預設值: "povey")

返回:

一個與 Kaldi 輸出完全相同的 MFCC。其形狀為 (m, num_ceps)，其中 m 在 _get_strided 中計算得出

返回型別:

Tensor

torchaudio.compliance.kaldi.mfcc¶

文件

教程

資源