AOTInductor：Torch.Export 模型的預先編譯¶

警告

AOTInductor 及其相關功能目前處於原型狀態，可能會發生回溯相容性破壞變更。

AOTInductor 是 TorchInductor 的特製版本，專為處理匯出的 PyTorch 模型、優化模型以及產生共用函式庫和其他相關成品而設計。這些編譯後的成品專為在非 Python 環境中部署而設計，這些環境經常被用於伺服器端的推論部署。

在本教學課程中，您將深入了解如何取得 PyTorch 模型、匯出模型、將模型編譯成共用函式庫，以及使用 C++ 進行模型預測。

模型編譯¶

使用 AOTInductor，您仍然可以使用 Python 來建立模型。以下範例示範如何呼叫 aot_compile 將模型轉換為共用函式庫。

此 API 使用 torch.export 將模型擷取到計算圖中，然後使用 TorchInductor 產生一個可以在非 Python 環境中執行的 .so。有關 torch._export.aot_compile API 的完整詳細資訊，請參閱此處的程式碼。有關 torch.export 的詳細資訊，請參閱 torch.export 文件。

注意

如果您的機器上有支援 CUDA 的裝置，並且您安裝了支援 CUDA 的 PyTorch，則以下程式碼會將模型編譯成共用函式庫，以供 CUDA 執行。否則，編譯後的成品將在 CPU 上執行。為了在 CPU 推論期間獲得更好的效能，建議在執行以下 Python 腳本之前，透過設定 export TORCHINDUCTOR_FREEZING=1 來啟用凍結。

import os
import torch

class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = torch.nn.Linear(10, 16)
        self.relu = torch.nn.ReLU()
        self.fc2 = torch.nn.Linear(16, 1)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.sigmoid(x)
        return x

with torch.no_grad():
    device = "cuda" if torch.cuda.is_available() else "cpu"
    model = Model().to(device=device)
    example_inputs=(torch.randn(8, 10, device=device),)
    batch_dim = torch.export.Dim("batch", min=1, max=1024)
    so_path = torch._export.aot_compile(
        model,
        example_inputs,
        # Specify the first dimension of the input x as dynamic
        dynamic_shapes={"x": {0: batch_dim}},
        # Specify the generated shared library path
        options={"aot_inductor.output_path": os.path.join(os.getcwd(), "model.so")},
    )

在此說明範例中，Dim 參數用於將輸入變數「x」的第一個維度指定為動態。值得注意的是，編譯後的函式庫的路徑和名稱未指定，因此共用函式庫會儲存在一個臨時目錄中。為了從 C++ 端存取此路徑，我們將其儲存到一個檔案中，以便稍後在 C++ 程式碼中擷取。

在 C++ 中進行推論¶

接下來，我們使用以下 C++ 檔案 inference.cpp 來載入上一步產生的共用函式庫，讓我們可以直接在 C++ 環境中進行模型預測。

注意

以下程式碼片段假設您的系統具有支援 CUDA 的裝置，並且您的模型已編譯為在 CUDA 上執行，如前所示。如果沒有 GPU，則需要進行以下調整才能在 CPU 上執行：1. 將 model_container_runner_cuda.h 更改為 model_container_runner_cpu.h 2. 將 AOTIModelContainerRunnerCuda 更改為 AOTIModelContainerRunnerCpu 3. 將 at::kCUDA 更改為 at::kCPU

#include <iostream>
#include <vector>

#include <torch/torch.h>
#include <torch/csrc/inductor/aoti_runner/model_container_runner_cuda.h>

int main() {
    c10::InferenceMode mode;

    torch::inductor::AOTIModelContainerRunnerCuda runner("model.so");
    std::vector<torch::Tensor> inputs = {torch::randn({8, 10}, at::kCUDA)};
    std::vector<torch::Tensor> outputs = runner.run(inputs);
    std::cout << "Result from the first inference:"<< std::endl;
    std::cout << outputs[0] << std::endl;

    // The second inference uses a different batch size and it works because we
    // specified that dimension as dynamic when compiling model.so.
    std::cout << "Result from the second inference:"<< std::endl;
    std::vector<torch::Tensor> inputs2 = {torch::randn({2, 10}, at::kCUDA)};
    std::cout << runner.run(inputs2)[0] << std::endl;

    return 0;
}

為了建置 C++ 檔案，您可以使用提供的 CMakeLists.txt 檔案，該檔案會自動執行呼叫 python model.py 來進行模型的 AOT 編譯，並將 inference.cpp 編譯成名為 aoti_example 的可執行二進位檔。

cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
project(aoti_example)

find_package(Torch REQUIRED)

add_executable(aoti_example inference.cpp model.so)

add_custom_command(
    OUTPUT model.so
    COMMAND python ${CMAKE_CURRENT_SOURCE_DIR}/model.py
    DEPENDS model.py
)

target_link_libraries(aoti_example "${TORCH_LIBRARIES}")
set_property(TARGET aoti_example PROPERTY CXX_STANDARD 17)

如果目錄結構類似於以下內容，您可以執行後續命令來建構二進位檔。請務必注意，CMAKE_PREFIX_PATH 變數對於 CMake 找到 LibTorch 函式庫至關重要，並且應設定為絕對路徑。請注意，您的路徑可能與本範例中顯示的路徑不同。

aoti_example/
    CMakeLists.txt
    inference.cpp
    model.py

$ mkdir build
$ cd build
$ CMAKE_PREFIX_PATH=/path/to/python/install/site-packages/torch/share/cmake cmake ..
$ cmake --build . --config Release

在 build 目錄中產生 aoti_example 二進位檔後，執行它將顯示類似於以下內容的結果

$ ./aoti_example
Result from the first inference:
0.4866
0.5184
0.4462
0.4611
0.4744
0.4811
0.4938
0.4193
[ CUDAFloatType{8,1} ]
Result from the second inference:
0.4883
0.4703
[ CUDAFloatType{2,1} ]

AOTInductor：Torch.Export 模型的預先編譯¶

模型編譯¶

在 C++ 中進行推論¶

文件

教學課程

資源