LayerNorm¶

class torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, bias=True, device=None, dtype=None)[source][source]¶

對輸入 mini-batch 應用 Layer Normalization。

本層實現了論文 Layer Normalization 中描述的操作

y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta

均值和標準差是在最後 D 個維度上計算的，其中 D 是 normalized_shape 的維度。例如，如果 normalized_shape 是 (3, 5)（一個 2 維形狀），則均值和標準差將在輸入的最後 2 個維度（即 input.mean((-2, -1))）上計算。 $\gamma$ 和 $\beta$ 是 normalized_shape 的可學習仿射變換引數，如果 elementwise_affine 為 True。方差透過有偏估計量計算，相當於 torch.var(input, unbiased=False)。

注意

與 Batch Normalization 和 Instance Normalization 不同，後者使用 affine 選項對每個整個通道/平面應用標量縮放和偏置，而 Layer Normalization 使用 elementwise_affine 應用逐元素的縮放和偏置。

此層在訓練和評估模式下都使用從輸入資料計算出的統計量。

引數

normalized_shape (int or list or torch.Size) –
輸入形狀，期望的輸入大小為

$[* \times \text{normalized\_shape}[0] \times \text{normalized\_shape}[1] \times \ldots \times \text{normalized\_shape}[-1]]$
如果使用單個整數，則將其視為單元素列表，並且此模組將對最後一個維度進行歸一化，該維度的預期大小即為此整數。
eps (float) – 新增到分母上的值，用於數值穩定性。預設值：1e-5
elementwise_affine (bool) – 一個布林值，設定為 True 時，此模組具有可學習的逐元素仿射引數，權重初始化為一，偏置初始化為零。預設值：True。
bias (bool) – 如果設定為 False，此層將不學習加性偏置（僅在 elementwise_affine 為 True 時相關）。預設值：True。

變數

weight – 當 elementwise_affine 設定為 True 時，模組的可學習權重，形狀為 $\text{normalized\_shape}$ 。這些值初始化為 1。
bias – 當 elementwise_affine 設定為 True 時，模組的可學習偏置，形狀為 $\text{normalized\_shape}$ 。這些值初始化為 0。

形狀

輸入: $(N, *)$
輸出: $(N, *)$ (與輸入形狀相同)

示例

>>> # NLP Example
>>> batch, sentence_length, embedding_dim = 20, 5, 10
>>> embedding = torch.randn(batch, sentence_length, embedding_dim)
>>> layer_norm = nn.LayerNorm(embedding_dim)
>>> # Activate module
>>> layer_norm(embedding)
>>>
>>> # Image Example
>>> N, C, H, W = 20, 5, 10, 10
>>> input = torch.randn(N, C, H, W)
>>> # Normalize over the last three dimensions (i.e. the channel and spatial dimensions)
>>> # as shown in the image below
>>> layer_norm = nn.LayerNorm([C, H, W])
>>> output = layer_norm(input)

LayerNorm¶

文件

教程

資源