torchaudio.sox_effects.apply_effects_file¶
- torchaudio.sox_effects.apply_effects_file(path: str, effects: List[List[str]], normalize: bool = True, channels_first: bool = True, format: Optional[str] = None) Tuple[Tensor, int][source]¶
將 Sox 效果應用於音訊檔案,並將結果資料載入為 Tensor
注意
此函式的用法與
sox命令非常相似,但存在細微差異。例如,sox命令會自動新增某些效果(如speed、pitch等之後新增rate效果),但此函式僅應用給定的效果。因此,要實際應用speed效果,您還需要指定所需的取樣率,因為在內部,speed效果僅改變取樣率而不觸動樣本。- 引數:
path (path-like object) – 音訊資料的原始檔路徑。
effects (List[List[str]]) – 效果列表。
normalize (bool, optional) –
當為
True時,此函式將原生樣本型別轉換為float32。預設值:True。如果輸入檔案是整數 WAV,設定為
False會將結果 Tensor 型別更改為整數型別。此引數對整數 WAV 以外的格式無效。channels_first (bool, optional) – 當為 True 時,返回的 Tensor 維度為 [channel, time]。否則,返回的 Tensor 維度為 [time, channel]。
format (str or None, optional) – 使用給定的格式覆蓋格式檢測。當 libsox 無法從檔案頭或副檔名推斷格式時,提供此引數可能會有幫助。
- 返回值:
結果 Tensor 和取樣率。如果
normalize=True,結果 Tensor 始終為float32型別。如果normalize=False且輸入音訊檔案是整數 WAV 檔案,則結果 Tensor 具有相應的整數型別(注意不支援 24 位整數型別)。如果channels_first=True,結果 Tensor 維度為 [channel, time],否則為 [time, channel]。- 返回型別:
(Tensor, int)
- 示例 - 基本用法
>>> >>> # Defines the effects to apply >>> effects = [ ... ['gain', '-n'], # normalises to 0dB ... ['pitch', '5'], # 5 cent pitch shift ... ['rate', '8000'], # resample to 8000 Hz ... ] >>> >>> # Apply effects and load data with channels_first=True >>> waveform, sample_rate = apply_effects_file("data.wav", effects, channels_first=True) >>> >>> # Check the result >>> waveform.shape torch.Size([2, 8000]) >>> waveform tensor([[ 5.1151e-03, 1.8073e-02, 2.2188e-02, ..., 1.0431e-07, -1.4761e-07, 1.8114e-07], [-2.6924e-03, 2.1860e-03, 1.0650e-02, ..., 6.4122e-07, -5.6159e-07, 4.8103e-07]]) >>> sample_rate 8000
- 示例 - 對資料集應用隨機速度擾動
>>> >>> # Load data from file, apply random speed perturbation >>> class RandomPerturbationFile(torch.utils.data.Dataset): ... """Given flist, apply random speed perturbation ... ... Suppose all the input files are at least one second long. ... """ ... def __init__(self, flist: List[str], sample_rate: int): ... super().__init__() ... self.flist = flist ... self.sample_rate = sample_rate ... ... def __getitem__(self, index): ... speed = 0.5 + 1.5 * random.randn() ... effects = [ ... ['gain', '-n', '-10'], # apply 10 db attenuation ... ['remix', '-'], # merge all the channels ... ['speed', f'{speed:.5f}'], # duration is now 0.5 ~ 2.0 seconds. ... ['rate', f'{self.sample_rate}'], ... ['pad', '0', '1.5'], # add 1.5 seconds silence at the end ... ['trim', '0', '2'], # get the first 2 seconds ... ] ... waveform, _ = torchaudio.sox_effects.apply_effects_file( ... self.flist[index], effects) ... return waveform ... ... def __len__(self): ... return len(self.flist) ... >>> dataset = RandomPerturbationFile(file_list, sample_rate=8000) >>> loader = torch.utils.data.DataLoader(dataset, batch_size=32) >>> for batch in loader: >>> pass
- 使用
apply_effects_file的教程