• 文件 >
  • 加法合成 >
  • 舊版本(穩定版)
快捷方式

加法合成

作者: Moto Hira

本教程是 振盪器和 ADSR 包絡 的續篇。

本教程展示瞭如何使用 TorchAudio 的 DSP 函式執行加法合成和減法合成。

加法合成透過組合多個波形來建立音色。減法合成透過應用濾波器來建立音色。

警告

本教程需要原型 DSP 功能,這些功能在每夜構建版中可用。

有關安裝每夜構建版的說明,請參閱 https://pytorch.com.tw/get-started/locally

import torch
import torchaudio

print(torch.__version__)
print(torchaudio.__version__)
2.7.0
2.7.0

概述

try:
    from torchaudio.prototype.functional import adsr_envelope, extend_pitch, oscillator_bank
except ModuleNotFoundError:
    print(
        "Failed to import prototype DSP features. "
        "Please install torchaudio nightly builds. "
        "Please refer to https://pytorch.com.tw/get-started/locally "
        "for instructions to install a nightly build."
    )
    raise

import matplotlib.pyplot as plt
from IPython.display import Audio

建立多個頻率音高

加法合成的核心是振盪器。我們透過疊加振盪器生成的多個波形來建立音色。

振盪器教程 中,我們使用了 oscillator_bank()adsr_envelope() 來生成各種波形。

在本教程中,我們使用 extend_pitch() 從基頻建立音色。

首先,我們定義一些在本教程中使用的常量和輔助函式。

PI = torch.pi
PI2 = 2 * torch.pi

F0 = 344.0  # fundamental frequency
DURATION = 1.1  # [seconds]
SAMPLE_RATE = 16_000  # [Hz]

NUM_FRAMES = int(DURATION * SAMPLE_RATE)
def plot(freq, amp, waveform, sample_rate, zoom=None, vol=0.1):
    t = (torch.arange(waveform.size(0)) / sample_rate).numpy()

    fig, axes = plt.subplots(4, 1, sharex=True)
    axes[0].plot(t, freq.numpy())
    axes[0].set(title=f"Oscillator bank (bank size: {amp.size(-1)})", ylabel="Frequency [Hz]", ylim=[-0.03, None])
    axes[1].plot(t, amp.numpy())
    axes[1].set(ylabel="Amplitude", ylim=[-0.03 if torch.all(amp >= 0.0) else None, None])
    axes[2].plot(t, waveform)
    axes[2].set(ylabel="Waveform")
    axes[3].specgram(waveform, Fs=sample_rate)
    axes[3].set(ylabel="Spectrogram", xlabel="Time [s]", xlim=[-0.01, t[-1] + 0.01])

    for i in range(4):
        axes[i].grid(True)
    pos = axes[2].get_position()
    fig.tight_layout()

    if zoom is not None:
        ax = fig.add_axes([pos.x0 + 0.02, pos.y0 + 0.03, pos.width / 2.5, pos.height / 2.0])
        ax.plot(t, waveform)
        ax.set(xlim=zoom, xticks=[], yticks=[])

    waveform /= waveform.abs().max()
    return Audio(vol * waveform, rate=sample_rate, normalize=False)

諧波泛音

諧波泛音是頻率分量,其頻率是基頻的整數倍。

我們來看看如何生成合成器中常用的波形。即,

  • 鋸齒波

  • 方波

  • 三角波

鋸齒波

鋸齒波可以表示如下。它包含所有整數次諧波,因此也常用於減法合成。

\[\begin{align*} y_t &= \sum_{k=1}^{K} A_k \sin ( 2 \pi f_k t ) \\ \text{where} \\ f_k &= k f_0 \\ A_k &= -\frac{ (-1) ^k }{k \pi} \end{align*}\]

以下函式接收基頻和幅度,並根據上述公式新增擴充套件音高。

def sawtooth_wave(freq0, amp0, num_pitches, sample_rate):
    freq = extend_pitch(freq0, num_pitches)

    mults = [-((-1) ** i) / (PI * i) for i in range(1, 1 + num_pitches)]
    amp = extend_pitch(amp0, mults)
    waveform = oscillator_bank(freq, amp, sample_rate=sample_rate)
    return freq, amp, waveform

現在合成波形

freq0 = torch.full((NUM_FRAMES, 1), F0)
amp0 = torch.ones((NUM_FRAMES, 1))
freq, amp, waveform = sawtooth_wave(freq0, amp0, int(SAMPLE_RATE / F0), SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))
Oscillator bank (bank size: 46)
/pytorch/audio/src/torchaudio/prototype/functional/_dsp.py:63: UserWarning: Some frequencies are above nyquist frequency. Setting the corresponding amplitude to zero. This might cause numerically unstable gradient.
  warnings.warn(


可以透過振盪基頻來建立基於鋸齒波的時變音調。

fm = 10  # rate at which the frequency oscillates [Hz]
f_dev = 0.1 * F0  # the degree of frequency oscillation [Hz]

phase = torch.linspace(0, fm * PI2 * DURATION, NUM_FRAMES)
freq0 = F0 + f_dev * torch.sin(phase).unsqueeze(-1)

freq, amp, waveform = sawtooth_wave(freq0, amp0, int(SAMPLE_RATE / F0), SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))
Oscillator bank (bank size: 46)
/pytorch/audio/src/torchaudio/prototype/functional/_dsp.py:63: UserWarning: Some frequencies are above nyquist frequency. Setting the corresponding amplitude to zero. This might cause numerically unstable gradient.
  warnings.warn(


方波

方波僅包含奇數次諧波。

\[\begin{align*} y_t &= \sum_{k=0}^{K-1} A_k \sin ( 2 \pi f_k t ) \\ \text{where} \\ f_k &= n f_0 \\ A_k &= \frac{ 4 }{n \pi} \\ n &= 2k + 1 \end{align*}\]
def square_wave(freq0, amp0, num_pitches, sample_rate):
    mults = [2.0 * i + 1.0 for i in range(num_pitches)]
    freq = extend_pitch(freq0, mults)

    mults = [4 / (PI * (2.0 * i + 1.0)) for i in range(num_pitches)]
    amp = extend_pitch(amp0, mults)

    waveform = oscillator_bank(freq, amp, sample_rate=sample_rate)
    return freq, amp, waveform
freq0 = torch.full((NUM_FRAMES, 1), F0)
amp0 = torch.ones((NUM_FRAMES, 1))
freq, amp, waveform = square_wave(freq0, amp0, int(SAMPLE_RATE / F0 / 2), SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))
Oscillator bank (bank size: 23)
/pytorch/audio/src/torchaudio/prototype/functional/_dsp.py:63: UserWarning: Some frequencies are above nyquist frequency. Setting the corresponding amplitude to zero. This might cause numerically unstable gradient.
  warnings.warn(


三角波

三角波也僅包含奇數次諧波。

\[\begin{align*} y_t &= \sum_{k=0}^{K-1} A_k \sin ( 2 \pi f_k t ) \\ \text{where} \\ f_k &= n f_0 \\ A_k &= (-1) ^ k \frac{8}{(n\pi) ^ 2} \\ n &= 2k + 1 \end{align*}\]
def triangle_wave(freq0, amp0, num_pitches, sample_rate):
    mults = [2.0 * i + 1.0 for i in range(num_pitches)]
    freq = extend_pitch(freq0, mults)

    c = 8 / (PI**2)
    mults = [c * ((-1) ** i) / ((2.0 * i + 1.0) ** 2) for i in range(num_pitches)]
    amp = extend_pitch(amp0, mults)

    waveform = oscillator_bank(freq, amp, sample_rate=sample_rate)
    return freq, amp, waveform
freq, amp, waveform = triangle_wave(freq0, amp0, int(SAMPLE_RATE / F0 / 2), SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))
Oscillator bank (bank size: 23)
/pytorch/audio/src/torchaudio/prototype/functional/_dsp.py:63: UserWarning: Some frequencies are above nyquist frequency. Setting the corresponding amplitude to zero. This might cause numerically unstable gradient.
  warnings.warn(


非諧波分音

非諧波分音指頻率不是基頻整數倍的頻率分量。

它們對於重現逼真的聲音或使合成結果更有趣至關重要。

鐘聲

https://computermusicresource.com/Simple.bell.tutorial.html

num_tones = 9
duration = 2.0
num_frames = int(SAMPLE_RATE * duration)

freq0 = torch.full((num_frames, 1), F0)
mults = [0.56, 0.92, 1.19, 1.71, 2, 2.74, 3.0, 3.76, 4.07]
freq = extend_pitch(freq0, mults)

amp = adsr_envelope(
    num_frames=num_frames,
    attack=0.002,
    decay=0.998,
    sustain=0.0,
    release=0.0,
    n_decay=2,
)
amp = torch.stack([amp * (0.5**i) for i in range(num_tones)], dim=-1)

waveform = oscillator_bank(freq, amp, sample_rate=SAMPLE_RATE)

plot(freq, amp, waveform, SAMPLE_RATE, vol=0.4)
Oscillator bank (bank size: 9)


作為比較,以下是上述的諧波版本。只有頻率值不同。泛音的數量和幅度相同。

freq = extend_pitch(freq0, num_tones)
waveform = oscillator_bank(freq, amp, sample_rate=SAMPLE_RATE)

plot(freq, amp, waveform, SAMPLE_RATE)
Oscillator bank (bank size: 9)


參考

指令碼總執行時間: ( 0 分 4.900 秒)

由 Sphinx-Gallery 生成的相簿

文件

查閱 PyTorch 的完整開發者文件

檢視文件

教程

獲取針對初學者和高階開發者的深入教程

檢視教程

資源

查詢開發資源並獲得問題解答

檢視資源