autoquant¶

torchao.quantization.autoquant(model, example_input=None, qtensor_class_list=[<class 'torchao.quantization.autoquant.AQDefaultLinearWeight'>, <class 'torchao.quantization.autoquant.AQInt8WeightOnlyQuantizedLinearWeight'>, <class 'torchao.quantization.autoquant.AQInt8WeightOnlyQuantizedLinearWeight2'>, <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'>], filter_fn=None, mode=['interpolate', 0.85], manual=False, set_inductor_config=True, supress_autoquant_errors=True, min_sqnr=None, **aq_kwargs)[原始碼]¶

自動量化是一個過程，它識別出在給定一組潛在 qtensor 子類中，對模型的每一層進行量化的最快方式。

自動量化分三個步驟進行

1-準備模型：搜尋模型中的 Linear 層，將其權重替換為 AutoQuantizableLinearWeight。

2-形狀校準：使用者在一個或多個輸入上執行模型，記錄 AutoQuantizableLinearWeight 看到的啟用形狀/資料型別的詳細資訊，以便我們在步驟 3 中最佳化量化操作時知道使用什麼形狀/資料型別

3-完成自動量化：對於每個 AutoQuantizableLinearWeight，針對 qtensor_class_list 中的每個成員，在每種形狀/資料型別上執行基準測試。: 選擇最快的選項，從而得到一個高效能模型

此 autoquant 函式執行步驟 1。步驟 2 和 3 可以透過簡單地執行模型來完成。如果提供了 example_input，此函式也會執行模型（這將完成步驟 2 和 3）。此 autoquant API 可以處理已經應用了 torch.compile 的模型，在這種情況下，一旦模型執行並完成量化，torch.compile 過程也會照常進行。

為了最佳化輸入形狀/資料型別的組合，使用者可以將 manual 設定為 True，使用所有所需的形狀/資料型別執行模型，然後在記錄所需的輸入集後呼叫 model.finalize_autoquant 來完成量化。

引數：

model (torch.nn.Module) – 要自動量化的模型。
example_input (Any, optional) – 模型的示例輸入。如果提供，函式將對此輸入執行一次前向傳播（這將完全自動量化模型，除非 manual=True）。預設為 None。
qtensor_class_list (list, optional) – 用於量化的張量類列表。預設為 DEFAULT_AUTOQUANT_CLASS_LIST。
filter_fn (callable, optional) – 應用於模型引數的過濾函式。預設為 None。
mode (list, optional) – 包含量化模式設定的列表。第一個元素是模式型別（例如，“interpolate”），第二個元素是模式值（例如，0.85）。預設為 [“interpolate”, .85]。
manual (bool, optional) – 是否在單次執行後停止形狀校準並執行自動量化（預設 False），還是等待使用者呼叫 model.finalize_autoquant (True)，以便記錄多種形狀/資料型別的輸入。
set_inductor_config (bool, optional) – 是否自動使用推薦的 inductor 配置設定（預設為 True）
supress_autoquant_errors (bool, optional) – 是否在自動量化過程中抑制錯誤。（預設為 True）
min_sqnr (float, optional) – 量化層輸出與非量化層輸出的最小可接受信噪比（訊號量化噪聲比，https://en.wikipedia.org/wiki/Signal-to-quantization-noise_ratio），這用於過濾掉
impact (導致過大數值影響的量化方法，) –
reasonable (使用者可以從一個) –
result (合理的數值（例如 40）開始，並根據結果進行調整) –
**aq_kwargs – 自動量化過程的其他關鍵字引數。

返回值：

自動量化幷包裝後的模型。如果提供了 example_input，函式將執行一次前向傳播: 並返回前向傳播的結果。

返回型別：

torch.nn.Module

示例用法

torchao.autoquant(torch.compile(model)) model(*example_input)

# 多種輸入形狀 torchao.autoquant(model, manual=True) model(*example_input1) model(*example_input2) model.finalize_autoquant()

autoquant¶

文件

教程

資源