注意

跳到末尾下載完整的示例程式碼

使用 ExecuTorch 開發者工具剖析模型¶

作者： Jack Khuu

ExecuTorch 開發者工具是一套旨在為使用者提供剖析、除錯和視覺化 ExecuTorch 模型能力的工具。

本教程將展示如何利用開發者工具剖析模型的完整端到端流程。具體而言，它將：

生成開發者工具使用的工件（ETRecord，ETDump）。
建立一個使用這些工件的 Inspector 類。
利用 Inspector 類分析模型剖析結果。

先決條件¶

要執行本教程，首先需要設定 ExecuTorch 環境。

生成 ETRecord（可選）¶

第一步是生成一個 ETRecord。ETRecord 包含模型圖和元資料，用於將執行時結果（如剖析）關聯到 eager 模型。這是透過 executorch.devtools.generate_etrecord 生成的。

executorch.devtools.generate_etrecord 接受輸出檔案路徑 (str)、edge 方言模型 (EdgeProgramManager)、ExecuTorch 方言模型 (ExecutorchProgramManager)，以及一個可選的包含附加模型的字典。

在本教程中，使用一個示例模型（如下所示）進行演示。

import copy

import torch
import torch.nn as nn
import torch.nn.functional as F
from executorch.devtools import generate_etrecord

from executorch.exir import (
    EdgeCompileConfig,
    EdgeProgramManager,
    ExecutorchProgramManager,
    to_edge,
)
from torch.export import export, ExportedProgram


# Generate Model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square, you can specify with a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = torch.flatten(x, 1)  # flatten all dimensions except the batch dimension
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


model = Net()

aten_model: ExportedProgram = export(model, (torch.randn(1, 1, 32, 32),), strict=True)

edge_program_manager: EdgeProgramManager = to_edge(
    aten_model, compile_config=EdgeCompileConfig(_check_ir_validity=True)
)
edge_program_manager_copy = copy.deepcopy(edge_program_manager)
et_program_manager: ExecutorchProgramManager = edge_program_manager.to_executorch()


# Generate ETRecord
etrecord_path = "etrecord.bin"
generate_etrecord(etrecord_path, edge_program_manager_copy, et_program_manager)

警告

使用者應對 to_edge() 的輸出進行深度複製，並將該深度複製傳遞給 generate_etrecord API。這是必要的，因為後續的呼叫 to_executorch() 會執行就地修改並在此過程中丟失除錯資料。

生成 ETDump¶

下一步是生成一個 ETDump。ETDump 包含執行捆綁程式模型的執行時結果。

在本教程中，從上述示例模型建立了一個捆綁程式 (Bundled Program)。

import torch
from executorch.devtools import BundledProgram

from executorch.devtools.bundled_program.config import MethodTestCase, MethodTestSuite
from executorch.devtools.bundled_program.serialize import (
    serialize_from_bundled_program_to_flatbuffer,
)

from executorch.exir import to_edge
from torch.export import export

# Step 1: ExecuTorch Program Export
m_name = "forward"
method_graphs = {m_name: export(model, (torch.randn(1, 1, 32, 32),), strict=True)}

# Step 2: Construct Method Test Suites
inputs = [[torch.randn(1, 1, 32, 32)] for _ in range(2)]

method_test_suites = [
    MethodTestSuite(
        method_name=m_name,
        test_cases=[
            MethodTestCase(inputs=inp, expected_outputs=getattr(model, m_name)(*inp))
            for inp in inputs
        ],
    )
]

# Step 3: Generate BundledProgram
executorch_program = to_edge(method_graphs).to_executorch()
bundled_program = BundledProgram(executorch_program, method_test_suites)

# Step 4: Serialize BundledProgram to flatbuffer.
serialized_bundled_program = serialize_from_bundled_program_to_flatbuffer(
    bundled_program
)
save_path = "bundled_program.bp"
with open(save_path, "wb") as f:
    f.write(serialized_bundled_program)

使用 CMake（按照這些說明設定 cmake）執行捆綁程式以生成 ETDump。

cd executorch
./examples/devtools/build_example_runner.sh
cmake-out/examples/devtools/example_runner --bundled_program_path="bundled_program.bp"

建立 Inspector¶

最後一步是透過傳入工件路徑建立 Inspector。Inspector 從 ETDump 中獲取執行時結果，並將其與 Edge 方言圖的操作關聯起來。

回想：ETRecord 不是必需的。如果未提供 ETRecord，Inspector 將顯示執行時結果，但沒有操作關聯。

要視覺化所有執行時事件，呼叫 Inspector 的 print_data_tabular。

from executorch.devtools import Inspector

etrecord_path = "etrecord.bin"
etdump_path = "etdump.etdp"
inspector = Inspector(etdump_path=etdump_path, etrecord=etrecord_path)
inspector.print_data_tabular()

False

使用 Inspector 進行分析¶

Inspector 提供兩種訪問已攝取資訊的方式：EventBlocks 和 DataFrames。這些方式使使用者能夠對其模型效能執行自定義分析。

以下是使用 EventBlock 和 DataFrame 方法的示例用法。

# Set Up
import pprint as pp

import pandas as pd

pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_columns", None)

如果使用者需要原始剖析結果，他們可以執行類似於查詢 addmm.out 事件的原始執行時資料。

for event_block in inspector.event_blocks:
    # Via EventBlocks
    for event in event_block.events:
        if event.name == "native_call_addmm.out":
            print(event.name, event.perf_data.raw if event.perf_data else "")

    # Via Dataframe
    df = event_block.to_dataframe()
    df = df[df.event_name == "native_call_addmm.out"]
    print(df[["event_name", "raw"]])
    print()

如果使用者想將操作追蹤回其模型程式碼，他們可以執行類似於查詢最慢 convolution.out 呼叫的模組層次結構和堆疊跟蹤。

for event_block in inspector.event_blocks:
    # Via EventBlocks
    slowest = None
    for event in event_block.events:
        if event.name == "native_call_convolution.out":
            if slowest is None or event.perf_data.p50 > slowest.perf_data.p50:
                slowest = event
    if slowest is not None:
        print(slowest.name)
        print()
        pp.pprint(slowest.stack_traces)
        print()
        pp.pprint(slowest.module_hierarchy)

    # Via Dataframe
    df = event_block.to_dataframe()
    df = df[df.event_name == "native_call_convolution.out"]
    if len(df) > 0:
        slowest = df.loc[df["p50"].idxmax()]
        assert slowest
        print(slowest.name)
        print()
        pp.pprint(slowest.stack_traces if slowest.stack_traces else "")
        print()
        pp.pprint(slowest.module_hierarchy if slowest.module_hierarchy else "")

如果使用者想要模組的總執行時，他們可以使用 find_total_for_module。

print(inspector.find_total_for_module("L__self__"))
print(inspector.find_total_for_module("L__self___conv2"))

0.0
0.0

注意：find_total_for_module 是 Inspector 的特殊的一等方法。

結論¶

在本教程中，我們學習了使用 ExecuTorch 開發者工具處理 ExecuTorch 模型所需的步驟。它還展示瞭如何使用 Inspector API 分析模型執行結果。

提及的連結¶

指令碼總執行時間： (0 minutes 1.892 seconds)

由 Sphinx-Gallery 生成