⚠️ 通知：有限維護

本專案不再積極維護。現有版本仍然可用，但沒有計劃中的更新、錯誤修復、新功能或安全補丁。使用者應注意，漏洞可能無法得到解決。

使用 TorchServe 進行批次推理¶

本文件目錄¶

引言
先決條件
使用 TorchServe 預設處理程式進行批次推理
使用 ResNet-152 模型透過 TorchServe 進行批次推理
配置支援批處理的 TorchServe ResNet-152 模型演示
使用 Docker 配置支援批處理的 TorchServe ResNet-152 模型演示

引言¶

批次推理是將推理請求聚合起來，然後一次性透過 ML/DL 框架傳送這些聚合的請求進行推理的過程。TorchServe 設計上原生支援對傳入的推理請求進行批次處理。此功能使您能夠最佳地利用主機資源，因為大多數 ML/DL 框架都針對批次請求進行了最佳化。主機資源的這種最佳利用反過來又降低了使用 TorchServe 託管推理服務的運營成本。

在本文件中，我們將展示一個示例，說明如何在本地或使用 Docker 容器提供模型服務時，在 Torchserve 中使用批次推理。

先決條件¶

在開始閱讀本文件之前，請先閱讀以下文件

使用 TorchServe 預設處理程式進行批次推理¶

TorchServe 的預設處理程式原生支援批次推理，text_classifier 處理程式除外。

使用 ResNet-152 模型透過 TorchServe 進行批次推理¶

為了支援批次推理，TorchServe 需要以下內容

TorchServe 模型配置：透過使用“POST /models”管理 API 或 config.properties 中的設定來配置 batch_size 和 max_batch_delay。TorchServe 需要知道模型可以處理的最大批次大小以及 TorchServe 等待填充每個批次請求的最長時間。
模型處理程式程式碼：TorchServe 要求模型處理程式處理批次推理請求。

有關支援批處理的自定義模型處理程式的完整工作示例，請參閱Hugging face transformer 通用處理程式

TorchServe 模型配置¶

從 Torchserve 0.4.1 開始，有兩種方法可以配置 TorchServe 使用批次處理功能

透過使用POST /models API 提供批次配置資訊。
透過配置檔案 config.properties 提供批次配置資訊。

我們關注的配置屬性如下

batch_size：這是模型預期能夠處理的最大批次大小。
max_batch_delay：這是 TorchServe 等待接收 batch_size 數量請求的最大批次延遲時間，單位為 ms。如果在計時器超時前 TorchServe 未收到 batch_size 數量的請求，它會將已收到的請求傳送到模型 handler。

讓我們來看一個透過管理 API 使用此配置的示例

# The following command will register a model "resnet-152.mar" and configure TorchServe to use a batch_size of 8 and a max batch delay of 50 milliseconds.
curl -X POST "localhost:8081/models?url=resnet-152.mar&batch_size=8&max_batch_delay=50"

以下是使用 config.properties 進行此配置的示例

# The following command will register a model "resnet-152.mar" and configure TorchServe to use a batch_size of 8 and a max batch delay of 50 milli seconds, in the config.properties.

models={\
  "resnet-152": {\
    "1.0": {\
        "defaultVersion": true,\
        "marName": "resnet-152.mar",\
        "minWorkers": 1,\
        "maxWorkers": 1,\
        "batchSize": 8,\
        "maxBatchDelay": 50,\
        "responseTimeout": 120\
    }\
  }\
}

這些配置在 TorchServe 和模型的自定義服務程式碼（即處理程式程式碼）中都會用到。TorchServe 將批次相關的配置與每個模型關聯起來。前端隨後會嘗試聚合指定批次大小的請求，並將其傳送到後端。

配置支援批處理的 TorchServe ResNet-152 模型演示¶

在本節中，我們將啟動模型伺服器並載入 Resnet-152 模型，該模型使用預設的 image_classifier 處理程式進行批次推理。

安裝 TorchServe 和 Torch Model Archiver¶

首先，按照主要Readme 中的說明安裝所有必需的軟體包，包括 torchserve。

使用管理 API 配置 Resnet-152 的批次推理¶

啟動模型伺服器。在此示例中，我們將模型伺服器啟動執行在推理埠 8080 和管理埠 8081 上。

$ cat config.properties
...
inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
...
$ torchserve --start --model-store model_store

驗證 TorchServe 是否已啟動並正在執行

$ curl localhost:8080/ping
{
  "status": "Healthy"
}

現在讓我們載入 resnet-152 模型，該模型支援批次推理。因為這是一個示例，我們將啟動 1 個工作程序，處理批次大小為 3，max_batch_delay 為 10ms。

$ curl -X POST "localhost:8081/models?url=https://torchserve.pytorch.org/mar_files/resnet-152-batch_v2.mar&batch_size=3&max_batch_delay=10&initial_workers=1"
{
  "status": "Processing worker updates..."
}

驗證工作程序是否已正確啟動。

curl https://:8081/models/resnet-152-batch_v2

[
  {
    "modelName": "resnet-152-batch_v2",
    "modelVersion": "2.0",
    "modelUrl": "https://torchserve.pytorch.org/mar_files/resnet-152-batch_v2.mar",
    "runtime": "python",
    "minWorkers": 1,
    "maxWorkers": 1,
    "batchSize": 3,
    "maxBatchDelay": 10,
    "loadedAtStartup": false,
    "workers": [
      {
        "id": "9000",
        "startTime": "2021-06-14T23:18:21.793Z",
        "status": "READY",
        "memoryUsage": 1726554112,
        "pid": 19946,
        "gpu": true,
        "gpuUsage": "gpuId::0 utilization.gpu [%]::0 % utilization.memory [%]::0 % memory.used [MiB]::678 MiB"
      }
    ]
  }
]

現在讓我們測試此服務。

獲取用於測試此服務的影像

$ curl -LJO https://github.com/pytorch/serve/raw/master/examples/image_classifier/kitten.jpg

執行推理以測試模型。

  $ curl https://:8080/predictions/resnet-152-batch_v2 -T kitten.jpg
  {
      "tiger_cat": 0.5798614621162415,
      "tabby": 0.38344162702560425,
      "Egyptian_cat": 0.0342114195227623,
      "lynx": 0.0005819813231937587,
      "quilt": 0.000273319921689108
  }

透過 config.properties 配置 Resnet-152 的批次推理¶

在這裡，我們首先在 config.properties 中設定 batch_size 和 max_batch_delay，確保 mar 檔案位於 model-store 中，並且 models 設定中的版本與建立的 mar 檔案版本一致。要了解更多關於配置的資訊，請參閱這篇文件。

load_models=resnet-152-batch_v2.mar
models={\
  "resnet-152-batch_v2": {\
    "2.0": {\
        "defaultVersion": true,\
        "marName": "resnet-152-batch_v2.mar",\
        "minWorkers": 1,\
        "maxWorkers": 1,\
        "batchSize": 3,\
        "maxBatchDelay": 5000,\
        "responseTimeout": 120\
    }\
  }\
}

然後透過使用 --ts-config 標誌傳遞 config.properties 來啟動 Torchserve

torchserve --start --model-store model_store  --ts-config config.properties

驗證 TorchServe 是否已啟動並正在執行

$ curl localhost:8080/ping
{
  "status": "Healthy"
}

驗證工作程序是否已正確啟動。

curl https://:8081/models/resnet-152-batch_v2

[
  {
    "modelName": "resnet-152-batch_v2",
    "modelVersion": "2.0",
    "modelUrl": "resnet-152-batch_v2.mar",
    "runtime": "python",
    "minWorkers": 1,
    "maxWorkers": 1,
    "batchSize": 3,
    "maxBatchDelay": 5000,
    "loadedAtStartup": true,
    "workers": [
      {
        "id": "9000",
        "startTime": "2021-06-14T22:44:36.742Z",
        "status": "READY",
        "memoryUsage": 0,
        "pid": 19116,
        "gpu": true,
        "gpuUsage": "gpuId::0 utilization.gpu [%]::0 % utilization.memory [%]::0 % memory.used [MiB]::678 MiB"
      }
    ]
  }
]

現在讓我們測試此服務。

獲取用於測試此服務的影像

$ curl -LJO https://github.com/pytorch/serve/raw/master/examples/image_classifier/kitten.jpg

執行推理以測試模型。

  $ curl https://:8080/predictions/resnet-152-batch_v2 -T kitten.jpg
  {
      "tiger_cat": 0.5798614621162415,
      "tabby": 0.38344162702560425,
      "Egyptian_cat": 0.0342114195227623,
      "lynx": 0.0005819813231937587,
      "quilt": 0.000273319921689108
  }

使用 Docker 配置支援批處理的 TorchServe ResNet-152 模型演示¶

在這裡，我們展示了在使用 Docker 容器提供模型服務時如何註冊支援批次推理的模型。我們在 config.properties 中設定了 batch_size 和 max_batch_delay，類似於上一節，這些設定會被 dockered_entrypoint.sh 使用。

使用 Docker 容器進行 Resnet-152 的批次推理¶

在 config.properties 中設定批次 batch_size 和 max_batch_delay，如 dockered_entrypoint.sh 中所引用。

inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
metrics_address=http://127.0.0.1:8082
number_of_netty_threads=32
job_queue_size=1000
model_store=/home/model-server/model-store
load_models=resnet-152-batch_v2.mar
models={\
  "resnet-152-batch_v2": {\
    "1.0": {\
        "defaultVersion": true,\
        "marName": "resnet-152-batch_v2.mar",\
        "minWorkers": 1,\
        "maxWorkers": 1,\
        "batchSize": 3,\
        "maxBatchDelay": 100,\
        "responseTimeout": 120\
    }\
  }\
}

從此處構建目標 Docker 映象，此處我們使用 gpu 映象

./build_image.sh -g -cv cu102

使用容器啟動模型服務，並將 config.properties 傳遞給容器

 docker run --rm -it --gpus all -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 --name mar -v /home/ubuntu/serve/model_store:/home/model-server/model-store  -v $ path to config.properties:/home/model-server/config.properties  pytorch/torchserve:latest-gpu

驗證工作程序是否已正確啟動。

curl https://:8081/models/resnet-152-batch_v2

[
  {
    "modelName": "resnet-152-batch_v2",
    "modelVersion": "2.0",
    "modelUrl": "resnet-152-batch_v2.mar",
    "runtime": "python",
    "minWorkers": 1,
    "maxWorkers": 1,
    "batchSize": 3,
    "maxBatchDelay": 5000,
    "loadedAtStartup": true,
    "workers": [
      {
        "id": "9000",
        "startTime": "2021-06-14T22:44:36.742Z",
        "status": "READY",
        "memoryUsage": 0,
        "pid": 19116,
        "gpu": true,
        "gpuUsage": "gpuId::0 utilization.gpu [%]::0 % utilization.memory [%]::0 % memory.used [MiB]::678 MiB"
      }
    ]
  }
]

現在讓我們測試此服務。

獲取用於測試此服務的影像

$ curl -LJO https://github.com/pytorch/serve/raw/master/examples/image_classifier/kitten.jpg

執行推理以測試模型。

  $ curl https://:8080/predictions/resnet-152-batch_v2 -T kitten.jpg
  {
      "tiger_cat": 0.5798614621162415,
      "tabby": 0.38344162702560425,
      "Egyptian_cat": 0.0342114195227623,
      "lynx": 0.0005819813231937587,
      "quilt": 0.000273319921689108
  }

使用 TorchServe 進行批次推理¶

本文件目錄¶

引言¶

先決條件¶

使用 TorchServe 預設處理程式進行批次推理¶

使用 ResNet-152 模型透過 TorchServe 進行批次推理¶

TorchServe 模型配置¶

配置支援批處理的 TorchServe ResNet-152 模型演示¶

安裝 TorchServe 和 Torch Model Archiver¶

使用管理 API 配置 Resnet-152 的批次推理¶

透過 config.properties 配置 Resnet-152 的批次推理¶

使用 Docker 配置支援批處理的 TorchServe ResNet-152 模型演示¶

使用 Docker 容器進行 Resnet-152 的批次推理¶

文件

教程

資源