量化工具¶

參考實現方法¶

template<typename T, layout_t LAYOUT = layout_t::KCX> void QuantizeGroupwise(const float *src, int K, int C, int X, int G, const float *scales, const std::int32_t *zero_points, T *dst)¶

將 src 中的浮點資料量化為 T 型別。

模板引數:

T – 輸出量化資料型別（支援 int8_t、uint8_t 和 int32_t）
LAYOUT – src 中輸入張量的佈局。（支援 KCX 和 KXC）。KCX 對應於 KCRS 或 KCTRS（對於帶時間維度的權重張量）。KXC 對應於 KRSC 或 KTRSC（對於帶時間維度的權重張量）。

引數:

K – 權重張量的輸出通道數
C – 通道數
X – R*S 或 T*R*S
G – 組數（如果 G == C，函式執行逐通道量化；如果 1 < G < C，函式執行逐組量化；如果 G == 1，函式執行逐張量量化；）
scales – 浮點縮放因子。大小應等於 G。
zero_points – 零點（應可在 T 型別中表示）。大小應等於 G。

template<typename T> void FusedQuantizeDequantize(const float *src, float *dst, std::int64_t len, const TensorQuantizationParams &qparams, int thread_id = 0, int num_threads = 1, float noise_ratio = 0.0f)¶: 用於加速量化感知訓練的融合整數量化反量化核。使用提供的 qparams 將 src 中的 fp32 值量化為 (u)int8，並將量化後的整數值反量化回 fp32。

template<typename InputType> void FloatOrHalfToFusedNBitRowwiseQuantizedSBHalf(int bit_rate, const InputType *input, size_t input_rows, int input_columns, std::uint8_t *output)¶

將浮點（fp32 或 fp16）輸入轉換為逐行量化輸出。bitrate 指定量化輸出的位數。縮放和偏差（Scale and Bias）採用 fp16 格式。每行的縮放和偏差儲存在該行末尾（融合儲存）。

引數:: bit_rate – 可為 2、4 或 8

AVX-2 實現方法¶

uint32_t Xor128(void)¶: 基於這篇論文的 [0, 9] 範圍內的隨機數生成器。

void FindMinMax(const float *m, float *min, float *max, int64_t len)¶: 查詢浮點矩陣中的最小值和最大值。

template<bool A_SYMMETRIC, bool B_SYMMETRIC, QuantizationGranularity Q_GRAN, bool HAS_BIAS, bool FUSE_RELU, typename BIAS_TYPE = std::int32_t, bool DIRECT = false> void requantizeOutputProcessingAvx2(std::uint8_t *out, const std::int32_t *inp, const block_type_t &block, int ld_out, int ld_in, const requantizationParams_t<BIAS_TYPE> &r)¶: 使用 avx2 進行再量化，並融合偏差。

AVX-512 實現方法¶

template<bool A_SYMMETRIC, bool B_SYMMETRIC, QuantizationGranularity Q_GRAN, bool HAS_BIAS, bool FUSE_RELU, int C_PER_G, typename BIAS_TYPE = std::int32_t> void requantizeOutputProcessingGConvAvx512(std::uint8_t *out, const std::int32_t *inp, const block_type_t &block, int ld_out, int ld_in, const requantizationParams_t<BIAS_TYPE> &r)¶: 使用 AVX512 進行再量化。

量化工具¶

參考實現方法¶

AVX-2 實現方法¶

AVX-512 實現方法¶

文件

教程

資源