Deeplabv3

import torch
model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet50', pretrained=True)
# or any of these variants
# model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet101', pretrained=True)
# model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_mobilenet_v3_large', pretrained=True)
model.eval()

所有預訓練模型都期望輸入影像以相同方式進行歸一化，即形狀為 (N, 3, H, W) 的 3 通道 RGB 影像的小批次，其中 N 是影像數量，H 和 W 預期至少為 224 畫素。影像必須載入到 [0, 1] 範圍內，然後使用 mean = [0.485, 0.456, 0.406] 和 std = [0.229, 0.224, 0.225] 進行歸一化。

該模型返回一個 OrderedDict，其中包含兩個張量，它們與輸入張量具有相同的高度和寬度，但有 21 個類別。output['out'] 包含語義掩碼，output['aux'] 包含每個畫素的輔助損失值。在推理模式下，output['aux'] 沒有用處。因此，output['out'] 的形狀為 (N, 21, H, W)。更多文件可在此處找到。

# Download an example image from the pytorch website
import urllib
url, filename = ("https://github.com/pytorch/hub/raw/master/images/deeplab1.png", "deeplab1.png")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

# sample execution (requires torchvision)
from PIL import Image
from torchvision import transforms
input_image = Image.open(filename)
input_image = input_image.convert("RGB")
preprocess = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

# move the input and model to GPU for speed if available
if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')

with torch.no_grad():
    output = model(input_batch)['out'][0]
output_predictions = output.argmax(0)

此處的輸出形狀為 (21, H, W)，在每個位置都有未歸一化的機率，對應於每個類別的預測。要獲取每個類別的最大預測，然後將其用於下游任務，您可以執行 output_predictions = output.argmax(0)。

這是一個小片段，用於繪製預測圖，其中每個顏色分配給每個類別（請參見左側的視覺化影像）。

# create a color pallette, selecting a color for each class
palette = torch.tensor([2 ** 25 - 1, 2 ** 15 - 1, 2 ** 21 - 1])
colors = torch.as_tensor([i for i in range(21)])[:, None] * palette
colors = (colors % 255).numpy().astype("uint8")

# plot the semantic segmentation predictions of 21 classes in each color
r = Image.fromarray(output_predictions.byte().cpu().numpy()).resize(input_image.size)
r.putpalette(colors)

import matplotlib.pyplot as plt
plt.imshow(r)
# plt.show()

模型描述

Deeplabv3-ResNet 是由使用 ResNet-50 或 ResNet-101 主幹的 Deeplabv3 模型構建的。Deeplabv3-MobileNetV3-Large 是由使用 MobileNetV3 large 主幹的 Deeplabv3 模型構建的。預訓練模型已在 COCO train2017 的子集上進行訓練，涵蓋 Pascal VOC 資料集中存在的 20 個類別。

在 COCO val2017 資料集上評估的預訓練模型的準確性如下所示。

模型結構	平均 IOU	全域性畫素級準確性
deeplabv3_resnet50	66.4	92.4
deeplabv3_resnet101	67.4	92.4
deeplabv3_mobilenet_v3_large	60.3	91.2

資源

重新思考用於語義影像分割的空洞卷積

帶 ResNet-50、ResNet-101 和 MobileNet-V3 主幹的 DeepLabV3 模型

模型型別： 可指令碼化 | 視覺

提交者： PyTorch 基金會

在 GitHub 上檢視 17.2k

在Google Collab上開啟

開啟模型演示