KLDivLoss¶

class torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean', log_target=False)[原始碼][原始碼]¶

Kullback-Leibler 散度損失。

對於形狀相同的張量 $y_{\text{pred}},\ y_{\text{true}}$ ，其中 $y_{\text{pred}}$ 是 input， $y_{\text{true}}$ 是 target，我們定義**逐點 KL 散度**為

L(y_{\text{pred}},\ y_{\text{true}}) = y_{\text{true}} \cdot \log \frac{y_{\text{true}}}{y_{\text{pred}}} = y_{\text{true}} \cdot (\log y_{\text{true}} - \log y_{\text{pred}})

為了避免計算此量時出現下溢問題，此損失函式要求 input 引數位於對數空間。如果 log_target= True，則 target 引數也可以提供在對數空間中。

總而言之，此函式大致等效於計算

if not log_target: # default
    loss_pointwise = target * (target.log() - input)
else:
    loss_pointwise = target.exp() * (target - input)

然後根據 reduction 引數對結果進行歸約，如下所示

if reduction == "mean":  # default
    loss = loss_pointwise.mean()
elif reduction == "batchmean":  # mathematically correct
    loss = loss_pointwise.sum() / input.size(0)
elif reduction == "sum":
    loss = loss_pointwise.sum()
else:  # reduction == "none"
    loss = loss_pointwise

注意

與 PyTorch 中所有其他損失函式一樣，此函式要求第一個引數 input 是模型的輸出（例如神經網路的輸出），第二個引數 target 是資料集中的觀察值。這與標準數學符號 $KL(P\ ||\ Q)$ 不同，在標準數學符號中， $P$ 表示觀察值的分佈，而 $Q$ 表示模型的分佈。

警告

reduction= “mean” 不會返回真正的 KL 散度值，請使用 reduction= “batchmean”，它與數學定義一致。

引數

size_average (bool, 可選) – 已棄用（參見 reduction）。預設情況下，損失會在批次中的每個損失元素上取平均。請注意，對於某些損失，每個樣本有多個元素。如果欄位 size_average 設定為 False，則損失將改為在每個小批次上求和。當 reduce 為 False 時忽略此引數。預設值：True
reduce (bool, 可選) – 已棄用（參見 reduction）。預設情況下，損失會在每個小批次上根據 size_average 對觀察值求平均或求和。當 reduce 為 False 時，返回每個批次元素的損失，並忽略 size_average。預設值：True
reduction (str, 可選) – 指定應用於輸出的歸約方式。預設值：“mean”
log_target (bool, 可選) – 指定 target 是否位於對數空間。預設值：False

形狀

輸入: $(*)$ ，其中 $*$ 表示任意數量的維度。
目標: $(*)$ ，與輸入具有相同形狀。
輸出: 預設為標量。如果 reduction 為 ‘none’，則為 $(*)$ ，與輸入具有相同形狀。

示例：

>>> kl_loss = nn.KLDivLoss(reduction="batchmean")
>>> # input should be a distribution in the log space
>>> input = F.log_softmax(torch.randn(3, 5, requires_grad=True), dim=1)
>>> # Sample a batch of distributions. Usually this would come from the dataset
>>> target = F.softmax(torch.rand(3, 5), dim=1)
>>> output = kl_loss(input, target)

>>> kl_loss = nn.KLDivLoss(reduction="batchmean", log_target=True)
>>> log_target = F.log_softmax(torch.rand(3, 5), dim=1)
>>> output = kl_loss(input, log_target)

KLDivLoss¶

文件

教程

資源