TBE CPU 自動向量化¶

FP8/16/32 自動向量化實現方法¶

template<typename InType, typename IndexType, typename OffsetType, typename OutType> static bool ALWAYS_INLINE EmbeddingSpMDM_autovec (const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const InType *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const bool no_bag, const bool is_bf16_out, const bool is_bf16_in)

方法 EmbeddingSpMDM_ref 的 FP32 權重型別的自動向量化版本。

模板引數：

InType – 輸入資料型別（使用 uint8_t）
IndexType – 索引資料型別（使用 int64_t）
OffsetType – 偏移資料型別（使用 int32_t）
OutType – 輸出資料型別（使用 float）

引數：

block_size – 塊中的元素數量（int64_t）
output_size – 輸出中的元素數量（int64_t）
index_size – 索引中的元素數量（int64_t）
data_size – 資料中的元素數量（int64_t）
input – 輸入地址（InType*）
indices – 索引地址（IndexType*）
offsets_or_lengths – 偏移地址或長度地址（OffsetType*）
weights – 求和權重；可選，對於非加權求和可為 null（float*）
normalize_by_lengths – 是否按長度進行歸一化（bool）
out – 輸出地址（OutType*）
is_weight_positional – 如果為 true，權重是位置性的；對於 FP32 自動向量化實現，設定為 false（bool）
use_offsets – 如果為 true，將使用偏移而不是長度；對於 FP32 自動向量化實現，設定為 true（bool）
output_stride – 如果為 -1，則 output_stride 等於 block_size；對於 FP32 自動向量化實現，設定為 -1（int64_t）
input_stride – 如果為 -1，則 input_stride 等於 block_size；對於 FP32 自動向量化實現，設定為 -1（int64_t）
scale_bias_last – 如果為 true，scale 和 bias 出現在每行的末尾；對於 FP32 自動向量化實現，設定為 true（bool）
no_bag – 如果為 true，則沒有 embedding bag；對於 FP32 自動向量化實現，設定為 false（bool）
is_bf16_out – 如果為 true，輸出為 BFLOAT16 型別；對於 FP32 自動向量化實現，設定為 false（bool）
is_bf16_in – 如果為 true，輸入為 BFLOAT16 型別；對於 FP32 自動向量化實現，設定為 false（bool）

template<typename IndexType, typename OffsetType, typename OutType> static bool ALWAYS_INLINE EmbeddingSpMDMFP8_autovec (const int64_t block_size, const int64_t output_size, const int64_t index_size, const int64_t data_size, const uint8_t *input, const IndexType *indices, const OffsetType *offsets_or_lengths, const float *weights, bool normalize_by_lengths, OutType *out, const bool is_weight_positional, const bool use_offsets, const int64_t output_stride, const int64_t input_stride, const int exponent_bits, const int exponent_bias, const bool is_bf16_out)

方法 EmbeddingSpMDM_ref 的 FP8 權重型別的自動向量化版本。

模板引數：

InType – 輸入資料型別（使用 uint8_t）
IndexType – 索引資料型別（使用 int64_t）
OffsetType – 偏移資料型別（使用 int32_t）
OutType – 輸出資料型別（使用 float）

引數：

block_size – 塊中的元素數量（int64_t）
output_size – 輸出中的元素數量（int64_t）
index_size – 索引中的元素數量（int64_t）
data_size – 資料中的元素數量（int64_t）
input – 輸入地址（InType*）
indices – 索引地址（IndexType*）
offsets_or_lengths – 偏移地址或長度地址（OffsetType*）
weights – 求和權重；可選，對於非加權求和可為 null（float*）
normalize_by_lengths – 是否按長度進行歸一化（bool）
out – 輸出地址（OutType*）
is_weight_positional – 如果為 true，權重是位置性的；對於 FP8 自動向量化實現，設定為 false（bool）
use_offsets – 如果為 true，將使用偏移而不是長度；對於 FP8 自動向量化實現，設定為 true（bool）
output_stride – 如果為 -1，則 output_stride 等於 block_size；對於 FP8 自動向量化實現，設定為 -1（int64_t）
exponent_bits – 指數中使用的位數
exponent_bias – 指數中使用的偏差
is_bf16_out – 如果為 true，輸出為 BFLOAT16 型別；對於 FP8 自動向量化實現，設定為 false（bool）

TBE CPU 自動向量化¶

FP8/16/32 自動向量化實現方法¶

文件

教程

資源