注意

點選這裡下載完整示例程式碼

入門 || 張量 || Autograd || 構建模型 || TensorBoard 支援 || 訓練模型 || 理解模型

PyTorch 入門¶

建立日期：2021 年 11 月 30 日 | 最後更新日期：2024 年 1 月 19 日 | 最後驗證日期：2024 年 11 月 5 日

觀看下面的影片或在 youtube 上觀看。

PyTorch 張量¶

觀看從 03:50 開始的影片。

首先，我們將匯入 pytorch。

import torch

讓我們來看一些基本的張量操作。首先，是一些建立張量的方法

z = torch.zeros(5, 3)
print(z)
print(z.dtype)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])
torch.float32

上面，我們建立了一個填充零的 5x3 矩陣，並查詢其資料型別，發現這些零是 32 位浮點數，這是 PyTorch 的預設型別。

如果您想要整數而不是浮點數怎麼辦？您總是可以覆蓋預設設定

i = torch.ones((5, 3), dtype=torch.int16)
print(i)

tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]], dtype=torch.int16)

您可以看到，當我們確實改變預設設定時，張量在列印時會友好地報告這一點。

通常會隨機初始化學習權重，為了結果的可復現性，通常會為 PRNG 設定特定的種子

torch.manual_seed(1729)
r1 = torch.rand(2, 2)
print('A random tensor:')
print(r1)

r2 = torch.rand(2, 2)
print('\nA different random tensor:')
print(r2) # new values

torch.manual_seed(1729)
r3 = torch.rand(2, 2)
print('\nShould match r1:')
print(r3) # repeats values of r1 because of re-seed

A random tensor:
tensor([[0.3126, 0.3791],
        [0.3087, 0.0736]])

A different random tensor:
tensor([[0.4216, 0.0691],
        [0.2332, 0.4047]])

Should match r1:
tensor([[0.3126, 0.3791],
        [0.3087, 0.0736]])

PyTorch 張量直觀地執行算術運算。形狀相似的張量可以相加、相乘等。與標量的操作會分佈到整個張量上

ones = torch.ones(2, 3)
print(ones)

twos = torch.ones(2, 3) * 2 # every element is multiplied by 2
print(twos)

threes = ones + twos       # addition allowed because shapes are similar
print(threes)              # tensors are added element-wise
print(threes.shape)        # this has the same dimensions as input tensors

r1 = torch.rand(2, 3)
r2 = torch.rand(3, 2)
# uncomment this line to get a runtime error
# r3 = r1 + r2

tensor([[1., 1., 1.],
        [1., 1., 1.]])
tensor([[2., 2., 2.],
        [2., 2., 2.]])
tensor([[3., 3., 3.],
        [3., 3., 3.]])
torch.Size([2, 3])

這裡列出了一小部分可用的數學運算

r = (torch.rand(2, 2) - 0.5) * 2 # values between -1 and 1
print('A random matrix, r:')
print(r)

# Common mathematical operations are supported:
print('\nAbsolute value of r:')
print(torch.abs(r))

# ...as are trigonometric functions:
print('\nInverse sine of r:')
print(torch.asin(r))

# ...and linear algebra operations like determinant and singular value decomposition
print('\nDeterminant of r:')
print(torch.det(r))
print('\nSingular value decomposition of r:')
print(torch.svd(r))

# ...and statistical and aggregate operations:
print('\nAverage and standard deviation of r:')
print(torch.std_mean(r))
print('\nMaximum value of r:')
print(torch.max(r))

A random matrix, r:
tensor([[ 0.9956, -0.2232],
        [ 0.3858, -0.6593]])

Absolute value of r:
tensor([[0.9956, 0.2232],
        [0.3858, 0.6593]])

Inverse sine of r:
tensor([[ 1.4775, -0.2251],
        [ 0.3961, -0.7199]])

Determinant of r:
tensor(-0.5703)

Singular value decomposition of r:
torch.return_types.svd(
U=tensor([[-0.8353, -0.5497],
        [-0.5497,  0.8353]]),
S=tensor([1.1793, 0.4836]),
V=tensor([[-0.8851, -0.4654],
        [ 0.4654, -0.8851]]))

Average and standard deviation of r:
(tensor(0.7217), tensor(0.1247))

Maximum value of r:
tensor(0.9956)

關於 PyTorch 張量的強大功能，還有很多需要了解的，包括如何為 GPU 上的平行計算進行設定——我們將在另一個影片中深入探討。

PyTorch 模型¶

觀看從 10:00 開始的影片。

讓我們來談談如何在 PyTorch 中表達模型

import torch                     # for all things PyTorch
import torch.nn as nn            # for torch.nn.Module, the parent object for PyTorch models
import torch.nn.functional as F  # for the activation function

圖：LeNet-5

上圖是 LeNet-5 的示意圖，它是最早的卷積神經網路之一，也是深度學習爆炸式增長的驅動力之一。它被構建用於讀取手寫數字的小影像（MNIST 資料集），並正確分類影像中表示的數字。

下面是其工作原理的簡化版本

C1 層是卷積層，這意味著它掃描輸入影像以查詢在訓練期間學習到的特徵。它輸出一個地圖，顯示其在影像中看到每個學習到的特徵的位置。這個“啟用圖”在 S2 層進行下采樣。
C3 層是另一個卷積層，這次掃描 C1 的啟用圖以查詢特徵的組合。它也輸出一個啟用圖，描述這些特徵組合的空間位置，這個啟用圖在 S4 層進行下采樣。
最後，末端的全連線層 F5、F6 和 OUTPUT 是一個分類器，它接收最終的啟用圖，並將其分類到代表 10 個數字的十個類別之一。

我們如何在程式碼中表達這個簡單的神經網路？

class LeNet(nn.Module):

    def __init__(self):
        super(LeNet, self).__init__()
        # 1 input image channel (black & white), 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

檢視這段程式碼，您應該能夠發現與上圖的一些結構相似之處。

這展示了典型 PyTorch 模型的結構

它繼承自 torch.nn.Module - 模組可以巢狀 - 事實上，甚至 Conv2d 和 Linear 層類也繼承自 torch.nn.Module。
模型會有一個 __init__() 函式，在此函式中例項化其層，並載入可能需要的任何資料工件（例如，一個 NLP 模型可能載入詞彙表）。
模型會有一個 forward() 函式。這是實際計算發生的地方：輸入透過網路層和各種函式來生成輸出。
除此之外，您可以像構建其他 Python 類一樣構建您的模型類，新增您需要支援模型計算的任何屬性和方法。

讓我們例項化這個物件並執行一個示例輸入透過它。

net = LeNet()
print(net)                         # what does the object tell us about itself?

input = torch.rand(1, 1, 32, 32)   # stand-in for a 32x32 black & white image
print('\nImage batch shape:')
print(input.shape)

output = net(input)                # we don't call forward() directly
print('\nRaw output:')
print(output)
print(output.shape)

LeNet(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

Image batch shape:
torch.Size([1, 1, 32, 32])

Raw output:
tensor([[ 0.0898,  0.0318,  0.1485,  0.0301, -0.0085, -0.1135, -0.0296,  0.0164,
          0.0039,  0.0616]], grad_fn=<AddmmBackward0>)
torch.Size([1, 10])

上面發生了幾件重要的事情

首先，我們例項化 LeNet 類，並列印 net 物件。一個 torch.nn.Module 的子類將報告其建立的層及其形狀和引數。如果您想快速瞭解模型的處理過程，這可以提供一個方便的概述。

在其下方，我們建立一個虛擬輸入，代表一個具有 1 個顏色通道的 32x32 影像。通常，您會載入一個影像塊並將其轉換為這種形狀的張量。

您可能已經注意到我們的張量有一個額外的維度 - 批次維度。PyTorch 模型假定它們正在處理資料的批次 - 例如，16 個影像塊的批次將具有形狀 (16, 1, 32, 32)。由於我們只使用一張影像，我們建立一個形狀為 (1, 1, 32, 32) 的大小為 1 的批次。

我們透過像函式一樣呼叫模型來請求推理：net(input)。此呼叫的輸出代表模型對輸入代表特定數字的置信度。（由於此模型例項尚未學習任何內容，我們不應期望在輸出中看到任何訊號。）檢視 output 的形狀，我們可以看到它也具有批次維度，其大小應始終與輸入批次維度匹配。如果我們傳入了 16 個例項的輸入批次，output 的形狀將是 (16, 10)。

資料集和資料載入器¶

觀看從 14:00 開始的影片。

下面，我們將演示如何使用 TorchVision 中一個隨時可下載的開放訪問資料集，如何轉換影像以供模型使用，以及如何使用 DataLoader 向模型饋送批次資料。

我們需要做的第一件事是將輸入的影像轉換為 PyTorch 張量。

#%matplotlib inline

import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616))])

在這裡，我們為輸入指定兩種變換

transforms.ToTensor() 將 Pillow 載入的影像轉換為 PyTorch 張量。
transforms.Normalize() 調整張量的值，使其平均值為零，標準差為 1.0。大多數啟用函式在 x = 0 附近有最強的梯度，因此將資料居中可以加快學習速度。傳遞給變換的值是資料集中影像 rgb 值的平均值（第一個元組）和標準差（第二個元組）。您可以透過執行以下幾行程式碼自己計算這些值

```
from torch.utils.data import ConcatDataset transform = transforms.Compose([transforms.ToTensor()]) trainset = torchvision.datasets.CIFAR10(root=’./data’, train=True,

download=True, transform=transform)

#將所有訓練影像堆疊成形狀為 #(50000, 3, 32, 32) 的張量 x = torch.stack([sample[0] for sample in ConcatDataset([trainset])])

#獲取每個通道的平均值 mean = torch.mean(x, dim=(0,2,3)) #tensor([0.4914, 0.4822, 0.4465]) std = torch.std(x, dim=(0,2,3)) #tensor([0.2470, 0.2435, 0.2616])

```

還有許多其他可用的變換，包括裁剪、居中、旋轉和翻轉。

接下來，我們將建立一個 CIFAR10 資料集的例項。這是一組 32x32 彩色影像塊，代表 10 類物件：6 類動物（鳥、貓、鹿、狗、青蛙、馬）和 4 類車輛（飛機、汽車、輪船、卡車）

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)

  0%|          | 0.00/170M [00:00<?, ?B/s]
  0%|          | 459k/170M [00:00<00:37, 4.52MB/s]
  5%|4         | 7.77M/170M [00:00<00:03, 44.6MB/s]
 10%|#         | 17.9M/170M [00:00<00:02, 70.2MB/s]
 15%|#5        | 26.0M/170M [00:00<00:01, 72.6MB/s]
 20%|##        | 34.4M/170M [00:00<00:01, 76.4MB/s]
 25%|##4       | 42.2M/170M [00:00<00:01, 77.0MB/s]
 29%|##9       | 50.2M/170M [00:00<00:01, 77.7MB/s]
 34%|###4      | 58.3M/170M [00:00<00:01, 78.7MB/s]
 39%|###9      | 67.1M/170M [00:00<00:01, 81.6MB/s]
 44%|####4     | 75.3M/170M [00:01<00:01, 80.0MB/s]
 49%|####9     | 84.3M/170M [00:01<00:01, 83.1MB/s]
 54%|#####4    | 92.7M/170M [00:01<00:00, 82.0MB/s]
 59%|#####9    | 101M/170M [00:01<00:00, 80.7MB/s]
 64%|######3   | 109M/170M [00:01<00:00, 80.1MB/s]
 69%|######8   | 117M/170M [00:01<00:00, 80.3MB/s]
 73%|#######3  | 125M/170M [00:01<00:00, 75.3MB/s]
 80%|#######9  | 136M/170M [00:01<00:00, 85.5MB/s]
 85%|########5 | 145M/170M [00:01<00:00, 84.2MB/s]
 90%|######### | 153M/170M [00:01<00:00, 83.6MB/s]
 95%|#########4| 162M/170M [00:02<00:00, 83.5MB/s]
100%|#########9| 170M/170M [00:02<00:00, 83.0MB/s]
100%|##########| 170M/170M [00:02<00:00, 78.5MB/s]

注意

當您執行上面的單元格時，資料集下載可能需要一些時間。

這是在 PyTorch 中建立資料集物件的一個例子。可下載的資料集（如上面的 CIFAR-10）是 torch.utils.data.Dataset 的子類。PyTorch 中的 Dataset 類包括 TorchVision、Torchtext 和 TorchAudio 中的可下載資料集，以及像 torchvision.datasets.ImageFolder 這樣的實用資料集類，後者將讀取帶標籤影像的資料夾。您也可以建立自己的 Dataset 子類。

當我們例項化資料集時，我們需要告訴它幾件事

我們希望資料存放的檔案系統路徑。
我們是否將此集合用於訓練；大多數資料集都會被分成訓練集和測試集。
如果我們尚未下載資料集，是否希望下載它。
我們想要應用於資料的變換。

資料集準備好後，您可以將其提供給 DataLoader

trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

一個 Dataset 子類封裝了對資料的訪問，並且專門用於其提供的資料型別。DataLoader 對資料一無所知，但它根據您指定的引數，將由 Dataset 提供的輸入張量組織成批次。

在上面的示例中，我們要求一個 DataLoader 從 trainset 中提供大小為 4 的影像批次，並隨機打亂它們的順序 (shuffle=True)，並且告訴它啟動兩個工作程序從磁碟載入資料。

視覺化 DataLoader 提供的批次是一個好習慣

import matplotlib.pyplot as plt
import numpy as np

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))


# get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Got range [-0.49473685..1.5632443].
 ship   car horse  ship

執行上面的單元格應該會顯示一排四張影像，以及每張影像的正確標籤。

訓練您的 PyTorch 模型¶

觀看從 17:10 開始的影片。

讓我們把所有部分組合起來，訓練一個模型

#%matplotlib inline

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import torchvision
import torchvision.transforms as transforms

import matplotlib
import matplotlib.pyplot as plt
import numpy as np

首先，我們需要訓練集和測試集。如果您還沒有下載資料集，請執行下面的單元格以確保資料集已下載。（可能需要一分鐘。）

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

我們將在 DataLoader 的輸出上執行檢查

import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))


# get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

cat   cat  deer  frog

這是我們要訓練的模型。如果它看起來很熟悉，那是因為它是 LeNet 的一個變體 - 在本影片前面討論過 - 適配於 3 色影像。

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

我們需要的最後要素是損失函式和最佳化器

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

損失函式，如本影片前面討論的，衡量模型預測與我們理想輸出的偏離程度。交叉熵損失是像我們這樣的分類模型典型的損失函式。

最佳化器是驅動學習的機制。在這裡，我們建立了一個實現隨機梯度下降的最佳化器，它是更直接的最佳化演算法之一。除了演算法的引數，如學習率 (lr) 和動量，我們還傳入了 net.parameters()，這是模型中所有學習權重的集合——這就是最佳化器調整的物件。

最後，所有這些都組合到訓練迴圈中。請繼續執行此單元格，因為它可能需要幾分鐘才能執行完畢

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

[1,  2000] loss: 2.195
[1,  4000] loss: 1.879
[1,  6000] loss: 1.656
[1,  8000] loss: 1.576
[1, 10000] loss: 1.517
[1, 12000] loss: 1.461
[2,  2000] loss: 1.415
[2,  4000] loss: 1.368
[2,  6000] loss: 1.334
[2,  8000] loss: 1.327
[2, 10000] loss: 1.318
[2, 12000] loss: 1.261
Finished Training

在這裡，我們只進行 2 個訓練 epoch（第 1 行）- 也就是說，對訓練資料集進行兩次遍歷。每次遍歷都有一個內部迴圈，用於迭代訓練資料（第 4 行），提供經過變換的輸入影像及其正確標籤的批次。

梯度清零（第 9 行）是一個重要的步驟。梯度會在一個批次中累積；如果我們不為每個批次重置它們，它們將持續累積，這將提供不正確的梯度值，使得學習不可能。

在第 12 行，我們要求模型對該批次進行預測。在下一行（第 13 行），我們計算損失 - outputs（模型預測）與 labels（正確輸出）之間的差異。

在第 14 行，我們執行 backward() 傳播，並計算將指導學習的梯度。

在第 15 行，最佳化器執行一步學習 - 它使用從 backward() 呼叫中獲得的梯度來微調學習權重，使其朝著它認為會降低損失的方向移動。

迴圈的其餘部分對 epoch 數、已完成的訓練例項數以及整個訓練迴圈中收集的損失進行了一些簡要報告。

當您執行上面的單元格時，您應該會看到如下內容

[1,  2000] loss: 2.235
[1,  4000] loss: 1.940
[1,  6000] loss: 1.713
[1,  8000] loss: 1.573
[1, 10000] loss: 1.507
[1, 12000] loss: 1.442
[2,  2000] loss: 1.378
[2,  4000] loss: 1.364
[2,  6000] loss: 1.349
[2,  8000] loss: 1.319
[2, 10000] loss: 1.284
[2, 12000] loss: 1.267
Finished Training

請注意，損失是單調下降的，這表明我們的模型在訓練資料集上的效能正在持續改進。

作為最後一步，我們應該檢查模型是否真的在進行泛化學習，而不是簡單地“記憶”資料集。這稱為過擬合，通常表明資料集太小（沒有足夠的示例進行泛化學習），或者模型具有比正確建模資料集所需的更多的學習引數。

這就是將資料集分成訓練集和測試集的原因——為了測試模型的泛化能力，我們要求它對它未訓練過的資料進行預測

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Accuracy of the network on the 10000 test images: 54 %

如果您跟著操作，您應該會看到模型目前的大約準確率為 50%。這並非最先進水平，但這遠好於我們從隨機輸出中期望的 10% 準確率。這表明模型確實發生了一些泛化學習。

指令碼總執行時間： ( 1 分 20.320 秒)

由 Sphinx-Gallery 生成的畫廊