⚠️ 注意：有限維護

此專案已不再積極維護。現有版本仍可使用，但沒有計劃的更新、錯誤修復、新功能或安全補丁。使用者應注意，漏洞可能不會被解決。

定製服務¶

文件目錄¶

定製處理程式
建立包含入口點的模型歸檔
處理模型在 GPU 上的執行
安裝模型特定的 Python 依賴項

定製處理程式¶

透過編寫一個 Python 指令碼來定製 TorchServe 的行為，在使用模型歸檔器時，將此指令碼與模型一起打包。TorchServe 在執行時會執行此程式碼。

提供定製指令碼以用於

初始化模型例項
在將輸入資料傳送到模型進行推理或 Captum 解釋之前進行預處理
定製模型如何被呼叫進行推理或解釋
在傳送響應之前對模型輸出進行後處理

以下內容適用於所有型別的定製處理程式

data - 來自傳入請求的輸入資料
context - 是 TorchServe 上下文。您可以使用以下資訊進行定製：model_name, model_dir, manifest, batch_size, gpu 等。

從 BaseHandler 開始！¶

BaseHandler 實現了您所需的大部分功能。您可以從中派生一個新類，如示例和預設處理程式所示。大多數情況下，您只需要重寫 preprocess 或 postprocess 方法。

具有 `模組` 級別入口點的定製處理程式¶

定製處理程式檔案必須定義一個模組級別的函式作為執行的入口點。該函式可以有任何名稱，但必須接受以下引數並返回預測結果。

入口點函式的簽名是

# Create model object
model = None

def entry_point_function_name(data, context):
    """
    Works on data and context to create model object or process inference request.
    Following sample demonstrates how model object can be initialized for jit mode.
    Similarly you can do it for eager mode models.
    :param data: Input data for prediction
    :param context: context contains model server system properties
    :return: prediction output
    """
    global model

    if not data:
        manifest = context.manifest

        properties = context.system_properties
        model_dir = properties.get("model_dir")
        device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

        # Read model serialize/pt file
        serialized_file = manifest['model']['serializedFile']
        model_pt_path = os.path.join(model_dir, serialized_file)
        if not os.path.isfile(model_pt_path):
            raise RuntimeError("Missing the model.pt file")

        model = torch.jit.load(model_pt_path)
    else:
        #infer and return result
        return model(data)

此入口點在以下兩種情況下被呼叫

當要求 TorchServe 擴充套件模型以增加後端 worker 數量時（這可以透過 PUT /models/{model_name} 請求或帶有 initial-workers 選項的 POST /models 請求完成，或者在 TorchServe 啟動時使用 --models 選項時完成（torchserve --start --models {model_name=model.mar}），即您提供要載入的模型時）
TorchServe 收到 POST /predictions/{model_name} 請求時。

(1) 用於擴充套件或縮減模型的 worker 數量。(2) 用作對模型執行推理的標準方式。(1) 也稱為模型載入時間。通常，您希望模型初始化程式碼在模型載入時執行。您可以在 TorchServe 管理 API 和 TorchServe 推理 API 中找到有關這些和其他 TorchServe API 的更多資訊。

具有 `類` 級別入口點的定製處理程式¶

您可以透過建立任何名稱的類來建立定製處理程式，但該類必須包含 initialize 和 handle 方法。

注意 - 如果您計劃在同一個 Python 模組/檔案中包含多個類，請確保處理程式類是列表中的第一個。

入口點類和方法的簽名是

class ModelHandler(object):
    """
    A custom model handler implementation.
    """

    def __init__(self):
        self._context = None
        self.initialized = False
        self.model = None
        self.device = None

    def initialize(self, context):
        """
        Invoke by torchserve for loading a model
        :param context: context contains model server system properties
        :return:
        """

        #  load the model
        self.manifest = context.manifest

        properties = context.system_properties
        model_dir = properties.get("model_dir")
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

        # Read model serialize/pt file
        serialized_file = self.manifest['model']['serializedFile']
        model_pt_path = os.path.join(model_dir, serialized_file)
        if not os.path.isfile(model_pt_path):
            raise RuntimeError("Missing the model.pt file")

        self.model = torch.jit.load(model_pt_path)

        self.initialized = True


    def handle(self, data, context):
        """
        Invoke by TorchServe for prediction request.
        Do pre-processing of data, prediction using model and postprocessing of prediciton output
        :param data: Input data for prediction
        :param context: Initial context contains model server system properties.
        :return: prediction output
        """
        pred_out = self.model.forward(data)
        return pred_out

高階定製處理程式¶

返回定製錯誤程式碼¶

要透過具有 模組 級別入口點的定製處理程式向用戶返回定製錯誤程式碼。

from ts.utils.util import PredictionException
def handle(data, context):
    # Some unexpected error - returning error code 513
    raise PredictionException("Some Prediction Error", 513)

要透過具有 類 級別入口點的定製處理程式向用戶返回定製錯誤程式碼。

from ts.torch_handler.base_handler import BaseHandler
from ts.utils.util import PredictionException

class ModelHandler(BaseHandler):
    """
    A custom model handler implementation.
    """

    def handle(self, data, context):
        # Some unexpected error - returning error code 513
        raise PredictionException("Some Prediction Error", 513)

從頭開始編寫用於預測和解釋請求的定製處理程式¶

通常應從 BaseHandler 派生，並且只重寫需要更改行為的方法！正如您在示例中看到的那樣，大多數情況下，您只需要重寫 preprocess 或 postprocess 方法。

儘管如此，您仍然可以從頭開始編寫一個類。下面是一個示例。基本上，它遵循典型的初始化-預處理-推理-後處理（Init-Pre-Infer-Post）模式來建立可維護的定製處理程式。

# custom handler file

# model_handler.py

"""
ModelHandler defines a custom model handler.
"""

from ts.torch_handler.base_handler import BaseHandler

class ModelHandler(BaseHandler):
    """
    A custom model handler implementation.
    """

    def __init__(self):
        self._context = None
        self.initialized = False
        self.explain = False
        self.target = 0

    def initialize(self, context):
        """
        Initialize model. This will be called during model loading time
        :param context: Initial context contains model server system properties.
        :return:
        """
        self._context = context
        self.initialized = True
        #  load the model, refer 'custom handler class' above for details

    def preprocess(self, data):
        """
        Transform raw input into model input data.
        :param batch: list of raw requests, should match batch size
        :return: list of preprocessed model input data
        """
        # Take the input data and make it inference ready
        preprocessed_data = data[0].get("data")
        if preprocessed_data is None:
            preprocessed_data = data[0].get("body")

        return preprocessed_data


    def inference(self, model_input):
        """
        Internal inference methods
        :param model_input: transformed model input data
        :return: list of inference output in NDArray
        """
        # Do some inference call to engine here and return output
        model_output = self.model.forward(model_input)
        return model_output

    def postprocess(self, inference_output):
        """
        Return inference result.
        :param inference_output: list of inference output
        :return: list of predict results
        """
        # Take output from network and post-process to desired format
        postprocess_output = inference_output
        return postprocess_output

    def handle(self, data, context):
        """
        Invoke by TorchServe for prediction request.
        Do pre-processing of data, prediction using model and postprocessing of prediciton output
        :param data: Input data for prediction
        :param context: Initial context contains model server system properties.
        :return: prediction output
        """
        model_input = self.preprocess(data)
        model_output = self.inference(model_input)
        return self.postprocess(model_output)

有關更多詳細資訊，請參閱 waveglow_handler。

定製處理程式的 Captum 解釋¶

Torchserve 返回影像分類、文字分類和 BERT 模型的 Captum 解釋。這可以透過傳送以下請求實現：POST /explanations/{model_name}

解釋作為基礎處理程式的 explain_handle 方法的一部分編寫。基礎處理程式呼叫此 explain_handle_method。傳遞給 explain handle 方法的引數是預處理後的資料和原始資料。它呼叫定製處理程式的 get_insights 函式，該函式返回 captum 歸因。使用者應自行編寫 get_insights 功能以獲取解釋。

對於服務定製處理程式，Captum 演算法應在處理程式的 initialize 函式中進行初始化。

使用者可以在定製處理程式中重寫 explain_handle 函式。使用者應為其定製處理程式定義 get_insights 方法以獲取 Captum 歸因。

上述 ModelHandler 類應包含以下具有 Captum 功能的方法。

    def initialize(self, context):
        """
        Load the model and its artifacts
        """
        .....
        self.lig = LayerIntegratedGradients(
                captum_sequence_forward, self.model.bert.embeddings
            )

    def handle(self, data, context):
        """
        Invoke by TorchServe for prediction/explanation request.
        Do pre-processing of data, prediction using model and postprocessing of prediction/explanations output
        :param data: Input data for prediction/explanation
        :param context: Initial context contains model server system properties.
        :return: prediction/ explanations output
        """
        model_input = self.preprocess(data)
        if not self._is_explain():
                model_output = self.inference(model_input)
                model_output = self.postprocess(model_output)
            else :
                model_output = self.explain_handle(model_input, data)
            return model_output

    # Present in the base_handler, so override only when neccessary
    def explain_handle(self, data_preprocess, raw_data):
        """Captum explanations handler

        Args:
            data_preprocess (Torch Tensor): Preprocessed data to be used for captum
            raw_data (list): The unprocessed data to get target from the request

        Returns:
            dict : A dictionary response with the explanations response.
        """
        output_explain = None
        inputs = None
        target = 0

        logger.info("Calculating Explanations")
        row = raw_data[0]
        if isinstance(row, dict):
            logger.info("Getting data and target")
            inputs = row.get("data") or row.get("body")
            target = row.get("target")
            if not target:
                target = 0

        output_explain = self.get_insights(data_preprocess, inputs, target)
        return output_explain

    def get_insights(self,**kwargs):
        """
        Functionality to get the explanations.
        Called from the explain_handle method
        """
        pass

擴充套件預設處理程式¶

TorchServe 包含以下預設處理程式。

如有需要，可以擴充套件上述處理程式以建立定製處理程式。此外，您還可以擴充套件抽象的 base_handler。

要在 Python 指令碼中匯入預設處理程式，請使用以下 import 語句。

from ts.torch_handler.<default_handler_name> import <DefaultHandlerClass>

以下是一個擴充套件預設 image_classifier 處理程式的定製處理程式示例。

from ts.torch_handler.image_classifier import ImageClassifier

class CustomImageClassifier(ImageClassifier):

    def preprocess(self, data):
        """
        Overriding this method for custom preprocessing.
        :param data: raw data to be transformed
        :return: preprocessed data for model input
        """
        # custom pre-procsess code goes here
        return data

有關更多詳細資訊，請參閱以下示例

建立包含入口點的模型歸檔¶

TorchServe 從清單檔案中識別定製服務的入口點。建立模型歸檔時，使用 --handler 選項指定入口點的位置。

model-archiver 工具使您能夠建立可由 TorchServe 服務的模型歸檔。

torch-model-archiver --model-name <model-name> --version <model_version_number> --handler model_handler[:<entry_point_function_name>] [--model-file <path_to_model_architecture_file>] --serialized-file <path_to_state_dict_file> [--extra-files <comma_seperarted_additional_files>] [--export-path <output-dir> --model-path <model_dir>] [--runtime python3]

注意 -

[] 中的選項是可選的。
如果您的處理程式模組中將其命名為 handle 或處理程式是 Python 類，則可以跳過 entry_point_function_name。

這會在目錄 <output-dir> 中為 python3 執行時建立檔案 <model-name>.mar。 --runtime 引數允許在執行時使用特定的 python 版本。預設情況下，它使用系統的預設 python 分發版本。

示例

torch-model-archiver --model-name waveglow_synthesizer --version 1.0 --model-file waveglow_model.py --serialized-file nvidia_waveglowpyt_fp32_20190306.pth --handler waveglow_handler.py --extra-files tacotron.zip,nvidia_tacotron2pyt_fp32_20190306.pth

處理模型在多個 GPU 上的執行¶

TorchServe 在 vCPU 或 GPU 上擴充套件後端 worker。在多個 GPU 的情況下，TorchServe 以輪詢方式選擇 GPU 裝置，並將此裝置 ID 透過上下文物件傳遞給模型處理程式。使用者應使用此 GPU ID 建立 PyTorch 裝置物件，以確保所有 worker 不在同一個 GPU 上建立。以下程式碼片段可在模型處理程式中使用來建立 PyTorch 裝置物件。

import torch

class ModelHandler(object):
    """
    A base Model handler implementation.
    """

    def __init__(self):
        self.device = None

    def initialize(self, context):
        properties = context.system_properties
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

安裝模型特定的 Python 依賴項¶

定製模型/處理程式可能依賴於不同的 Python 包，這些包預設情況下不會作為 TorchServe 設定的一部分安裝。

以下步驟允許使用者提供 TorchServe 需要安裝的定製 Python 包列表，以便無縫地服務模型。

定製服務¶

文件目錄¶

定製處理程式¶

從 BaseHandler 開始！¶

具有 `模組` 級別入口點的定製處理程式¶

具有 `類` 級別入口點的定製處理程式¶

高階定製處理程式¶

返回定製錯誤程式碼¶

從頭開始編寫用於預測和解釋請求的定製處理程式¶

定製處理程式的 Captum 解釋¶

擴充套件預設處理程式¶

建立包含入口點的模型歸檔¶

處理模型在多個 GPU 上的執行¶

安裝模型特定的 Python 依賴項¶

文件

教程

資源

定製服務¶

文件目錄¶

定製處理程式¶

從 BaseHandler 開始！¶

具有 模組 級別入口點的定製處理程式¶

具有 類 級別入口點的定製處理程式¶

高階定製處理程式¶

返回定製錯誤程式碼¶

從頭開始編寫用於預測和解釋請求的定製處理程式¶

定製處理程式的 Captum 解釋¶

擴充套件預設處理程式¶

建立包含入口點的模型歸檔¶

處理模型在多個 GPU 上的執行¶

安裝模型特定的 Python 依賴項¶

文件

教程

資源

具有 `模組` 級別入口點的定製處理程式¶

具有 `類` 級別入口點的定製處理程式¶