使用 XNNPACK 後端構建和執行 ExecuTorch¶

本教程將幫助您熟悉如何利用 ExecuTorch XNNPACK Delegate 來使用 CPU 硬體加速您的 ML 模型。本教程將介紹如何將模型匯出並序列化為二進位制檔案，以 XNNPACK Delegate 後端為目標，並在受支援的目標平臺執行模型。為了快速入門，請使用 ExecuTorch 程式碼倉庫中提供的指令碼，其中包含用於匯出和生成一些示例模型二進位制檔案的說明，這些示例模型展示了整個流程。

在本教程中您將學到什麼

在本教程中，您將學習如何匯出經 XNNPACK 降級處理後的模型，並在目標平臺執行它。

在開始之前，建議您先學習以下內容

將模型降級處理到 XNNPACK¶

import torch
import torchvision.models as models

from torch.export import export, ExportedProgram
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
from executorch.exir import EdgeProgramManager, ExecutorchProgramManager, to_edge_transform_and_lower
from executorch.exir.backend.backend_api import to_backend


mobilenet_v2 = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
sample_inputs = (torch.randn(1, 3, 224, 224), )

exported_program: ExportedProgram = export(mobilenet_v2, sample_inputs)
edge: EdgeProgramManager = to_edge_transform_and_lower(
    exported_program,
    partitioner=[XnnpackPartitioner()],
)

我們將使用從 TorchVision 庫下載的 MobileNetV2 預訓練模型來講解這個示例。模型的降級處理流程在將模型 to_edge 匯出後開始。我們呼叫 to_backend API，並傳入 XnnpackPartitioner。Partitioner 會識別適合 XNNPACK 後端 Delegate 使用的子圖。然後，識別出的子圖將使用 XNNPACK Delegate flatbuffer 模式進行序列化，並且每個子圖都將被替換為對 XNNPACK Delegate 的呼叫。

>>> print(edge.exported_program().graph_module)
GraphModule(
  (lowered_module_0): LoweredBackendModule()
  (lowered_module_1): LoweredBackendModule()
)



def forward(self, b_features_0_1_num_batches_tracked, ..., x):
    lowered_module_0 = self.lowered_module_0
    lowered_module_1 = self.lowered_module_1
    executorch_call_delegate_1 = torch.ops.higher_order.executorch_call_delegate(lowered_module_1, x);  lowered_module_1 = x = None
    getitem_53 = executorch_call_delegate_1[0];  executorch_call_delegate_1 = None
    aten_view_copy_default = executorch_exir_dialects_edge__ops_aten_view_copy_default(getitem_53, [1, 1280]);  getitem_53 = None
    aten_clone_default = executorch_exir_dialects_edge__ops_aten_clone_default(aten_view_copy_default);  aten_view_copy_default = None
    executorch_call_delegate = torch.ops.higher_order.executorch_call_delegate(lowered_module_0, aten_clone_default);  lowered_module_0 = aten_clone_default = None
    getitem_52 = executorch_call_delegate[0];  executorch_call_delegate = None
    return (getitem_52,)

我們在上面降級處理後列印了圖，以顯示為呼叫 XNNPACK Delegate 而插入的新節點。被委託給 XNNPACK 的子圖是每個呼叫位置的第一個引數。可以觀察到，大多數 convolution-relu-add 塊和 linear 塊能夠被委託給 XNNPACK。我們還可以看到未能降級到 XNNPACK Delegate 的運算元，例如 clone 和 view_copy。

exec_prog = edge.to_executorch()

with open("xnnpack_mobilenetv2.pte", "wb") as file:
    exec_prog.write_to_file(file)

將模型降級處理到 XNNPACK Program 後，我們可以為其準備 ExecuTorch，並將模型儲存為 .pte 檔案。.pte 是一種二進位制格式，用於儲存序列化的 ExecuTorch 圖。

將量化模型降級處理到 XNNPACK¶

XNNPACK delegate 也可以執行對稱量化模型。要理解量化流程並學習如何量化模型，請參閱自定義量化 (Custom Quantization) 說明。在本教程中，我們將利用方便地新增到 executorch/executorch/examples 資料夾中的 quantize() python 輔助函式。

from torch.export import export_for_training
from executorch.exir import EdgeCompileConfig, to_edge_transform_and_lower

mobilenet_v2 = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval()
sample_inputs = (torch.randn(1, 3, 224, 224), )

mobilenet_v2 = export_for_training(mobilenet_v2, sample_inputs).module() # 2-stage export for quantization path

from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
from executorch.backends.xnnpack.quantizer.xnnpack_quantizer import (
    get_symmetric_quantization_config,
    XNNPACKQuantizer,
)


def quantize(model, example_inputs):
    """This is the official recommended flow for quantization in pytorch 2.0 export"""
    print(f"Original model: {model}")
    quantizer = XNNPACKQuantizer()
    # if we set is_per_channel to True, we also need to add out_variant of quantize_per_channel/dequantize_per_channel
    operator_config = get_symmetric_quantization_config(is_per_channel=False)
    quantizer.set_global(operator_config)
    m = prepare_pt2e(model, quantizer)
    # calibration
    m(*example_inputs)
    m = convert_pt2e(m)
    print(f"Quantized model: {m}")
    # make sure we can export to flat buffer
    return m

quantized_mobilenetv2 = quantize(mobilenet_v2, sample_inputs)

量化需要分兩階段匯出。首先，我們使用 export_for_training API 在將模型交給 quantize 實用函式之前捕獲模型。執行量化步驟後，我們現在可以利用 XNNPACK delegate 對量化後的匯出模型圖進行降級處理。從這裡開始，流程與非量化模型降級處理到 XNNPACK 的過程相同。

# Continued from earlier...
edge = to_edge_transform_and_lower(
    export(quantized_mobilenetv2, sample_inputs),
    compile_config=EdgeCompileConfig(_check_ir_validity=False),
    partitioner=[XnnpackPartitioner()]
)

exec_prog = edge.to_executorch()

with open("qs8_xnnpack_mobilenetv2.pte", "wb") as file:
    exec_prog.write_to_file(file)

使用 `aot_compiler.py` 指令碼進行降級處理¶

我們還提供了一個指令碼，可以快速對幾個示例模型進行降級處理和匯出。您可以執行該指令碼來生成降級處理後的 fp32 和量化模型。此指令碼僅為方便起見而提供，執行的步驟與前兩節中列出的步驟完全相同。

python -m examples.xnnpack.aot_compiler --model_name="mv2" --quantize --delegate

請注意在上面的示例中，

-—model_name 指定要使用的模型
-—quantize 標誌控制模型是否應該被量化
-—delegate 標誌控制我們是否嘗試將圖的部分內容降級處理到 XNNPACK delegate。

生成的模型檔案將根據提供的引數命名為 [model_name]_xnnpack_[qs8/fp32].pte。

使用 CMake 執行 XNNPACK 模型¶

匯出 XNNPACK Delegated 模型後，我們現在可以使用 CMake 嘗試使用示例輸入執行它。我們可以構建和使用 xnn_executor_runner，它是 ExecuTorch 執行時和 XNNPACK 後端的一個示例包裝器。我們首先透過如下方式配置 CMake 構建

# cd to the root of executorch repo
cd executorch

# Get a clean cmake-out directory
./install_executorch.sh --clean
mkdir cmake-out

# Configure cmake
cmake \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_ENABLE_LOGGING=ON \
    -DPYTHON_EXECUTABLE=python \
    -Bcmake-out .

然後您可以使用以下命令構建執行時元件

cmake --build cmake-out -j9 --target install --config Release

現在您應該能夠在 ./cmake-out/backends/xnnpack/xnn_executor_runner 找到構建好的可執行檔案，您可以透過如下方式執行該可執行檔案並傳入您生成的模型

./cmake-out/backends/xnnpack/xnn_executor_runner --model_path=./mv2_xnnpack_fp32.pte
# or to run the quantized variant
./cmake-out/backends/xnnpack/xnn_executor_runner --model_path=./mv2_xnnpack_q8.pte

使用 XNNPACK 後端進行構建和連結¶

您可以構建 XNNPACK 後端 CMake target，並將其與您的應用程式二進位制檔案連結，例如 Android 或 iOS 應用程式。有關更多資訊，您可以接下來檢視此資源。

效能分析¶

要在 xnn_executor_runner 中啟用效能分析，請將標誌 -DEXECUTORCH_ENABLE_EVENT_TRACER=ON 和 -DEXECUTORCH_BUILD_DEVTOOLS=ON 傳遞給構建命令（新增 -DENABLE_XNNPACK_PROFILING=ON 以獲取更多詳細資訊）。這將在使用推理時啟用 ETDump 生成，並啟用用於效能分析的命令列標誌（詳情請參閱 xnn_executor_runner --help）。