torch.nn.utils.convert_conv2d_weight_memory_format¶

torch.nn.utils.convert_conv2d_weight_memory_format(module, memory_format)[原始碼][原始碼]¶

將 nn.Conv2d.weight 的 memory_format 轉換為指定的 memory_format。

此轉換遞迴應用於巢狀的 nn.Module，包括 module 本身。請注意，它僅更改 memory_format，而不改變每個維度的語義。此函式用於促進計算採用 NHWC 核心，這可以在計算能力 >= 7.0 的 CUDA 裝置上為 fp16 資料帶來顯著的加速。

注意

呼叫 model.to(memory_format=torch.channels_last) 比工具函式 convert_conv2d_weight_memory_format 更激進。任何具有 4D 權重的層都會受到 model.to 的影響，而這些層不一定能從轉換為指定的 memory_format 中受益。我們確信的一點是，在 cuDNN 中對卷積進行 NHWC (channels_last) 轉換是有益的，因為它有利於在 NHWC 中運行卷積，即使在必須對輸入張量應用置換 (permutation) 的情況下也是如此。

因此，我們的策略是隻將卷積的權重轉換為 channels_last。這確保了：1. 將使用快速卷積核心，其收益可能超過置換 (permutation) 的開銷（如果輸入格式不同）。2. 在不會從 memory_format 轉換中受益的層上不會應用不必要的置換。

最佳情況是，卷積層之間的層與 channels last 相容。輸入張量在遇到第一個卷積層時會被置換為 channels last 格式並保持該記憶體格式。因此，後續的卷積無需對其輸入張量進行置換。

如果卷積層之間存在與 channels last 不相容的層，我們需要將輸入張量對該層置換回 contiguous format。輸入張量將以 contiguous format 透過剩餘的層，並在遇到另一個卷積層時被置換為 channels last 格式。將該置換傳播到更早的層沒有意義，因為大多數層對 memory_format 相當不敏感。

當 PyTorch 支援置換融合時，這種說法可能會改變，因為可能存在比緊鄰卷積之前更好的位置來融合置換。

引數

module (nn.Module) – nn.Conv2d & nn.ConvTranspose2d 或容器 nn.Module
memory_format – 使用者指定的 memory_format，例如 torch.channels_last 或 torch.contiguous_format

返回

更新了 nn.Conv2d 的原始模組

示例

>>> input = torch.randint(1, 10, (2, 8, 4, 4), dtype=torch.float16, device="cuda")
>>> model = nn.Sequential(
>>>     nn.Conv2d(8, 4, 3)).cuda().half()
>>> # This is identical to:
>>> # nn.utils.convert_conv2d_weight_memory_format(model, torch.channels_last)
>>> model = nn.utils.convert_conv2d_weight_memory_format(model, torch.channels_last)
>>> out = model(input)

torch.nn.utils.convert_conv2d_weight_memory_format¶

文件

教程

資源