分散式與並行訓練教程¶

建立日期：Oct 04, 2022 | 最後更新：Oct 31, 2024 | 最後驗證：Nov 05, 2024

分散式訓練是一種模型訓練正規化，涉及將訓練工作負載分散到多個工作節點上，從而顯著提高訓練速度和模型精度。雖然分散式訓練可用於任何型別的機器學習模型訓練，但對於大型模型和計算密集型任務（如深度學習）使用它效益最大。

在 PyTorch 中有幾種執行分散式訓練的方法，每種方法在特定用例中都有其優勢

DistributedDataParallel (DDP)
Fully Sharded Data Parallel (FSDP)
Tensor Parallel (TP)
Device Mesh
Remote Procedure Call (RPC) 分散式訓練
自定義擴充套件

在分散式概覽中閱讀有關這些選項的更多資訊。

學習 DDP¶

DDP 入門影片教程

關於如何開始使用DistributedDataParallel並深入到更復雜主題的循序漸進影片系列

程式碼影片

https://pytorch.com.tw/tutorials/beginner/ddp_series_intro.html?utm_source=distr_landing&utm_medium=ddp_series_intro

分散式資料並行入門

本教程提供了 PyTorch 分散式資料並行（DistributedData Parallel）的簡短入門指南。

程式碼

https://pytorch.com.tw/tutorials/intermediate/ddp_tutorial.html?utm_source=distr_landing&utm_medium=intermediate_ddp_tutorial

使用 Join 上下文管理器進行輸入不均衡的分散式訓練

本教程描述了 Join 上下文管理器，並演示了其與 DistributedData Parallel 的用法。

程式碼

https://pytorch.com.tw/tutorials/advanced/generic_join.html?utm_source=distr_landing&utm_medium=generic_join

學習 FSDP¶

FSDP 入門

本教程演示瞭如何在 MNIST 資料集上使用 FSDP 進行分散式訓練。

程式碼

https://pytorch.com.tw/tutorials/intermediate/FSDP_tutorial.html?utm_source=distr_landing&utm_medium=FSDP_getting_started

FSDP 進階

在本教程中，您將學習如何使用 FSDP 對 HuggingFace (HF) T5 模型進行微調以用於文字摘要。

程式碼

https://pytorch.com.tw/tutorials/intermediate/FSDP_advanced_tutorial.html?utm_source=distr_landing&utm_medium=FSDP_advanced

學習 Tensor Parallel (TP)¶

使用 Tensor Parallel (TP) 進行大規模 Transformer 模型訓練

本教程演示瞭如何使用 Tensor Parallel 和 Fully Sharded Data Parallel 在數百到數千個 GPU 上訓練大型 Transformer 類模型。

程式碼

https://pytorch.com.tw/tutorials/intermediate/TP_tutorial.html

學習 DeviceMesh¶

DeviceMesh 入門

在本教程中，您將瞭解DeviceMesh以及它如何幫助進行分散式訓練。

程式碼

https://pytorch.com.tw/tutorials/recipes/distributed_device_mesh.html?highlight=devicemesh

學習 RPC¶

分散式 RPC 框架入門

本教程演示瞭如何開始進行基於 RPC 的分散式訓練。

程式碼

https://pytorch.com.tw/tutorials/intermediate/rpc_tutorial.html?utm_source=distr_landing&utm_medium=rpc_getting_started

使用分散式 RPC 框架實現引數伺服器

本教程將引導您完成一個使用 PyTorch 分散式 RPC 框架實現引數伺服器的簡單示例。

程式碼

https://pytorch.com.tw/tutorials/intermediate/rpc_param_server_tutorial.html?utm_source=distr_landing&utm_medium=rpc_param_server_tutorial

使用非同步執行實現批次 RPC 處理

在本教程中，您將使用 @rpc.functions.async_execution 裝飾器構建批次處理 RPC 應用。

程式碼

https://pytorch.com.tw/tutorials/intermediate/rpc_async_execution.html?utm_source=distr_landing&utm_medium=rpc_async_execution

結合分散式資料並行與分散式 RPC 框架

在本教程中，您將學習如何結合分散式資料並行與分散式模型並行。

程式碼

https://pytorch.com.tw/tutorials/advanced/rpc_ddp_tutorial.html?utm_source=distr_landing&utm_medium=rpc_plus_ddp

自定義擴充套件¶

使用 Cpp 擴充套件自定義 Process Group 後端

在本教程中，您將學習如何實現自定義ProcessGroup後端，並使用 cpp 擴充套件將其接入 PyTorch 分散式包。

程式碼

https://pytorch.com.tw/tutorials/intermediate/process_group_cpp_extension_tutorial.html?utm_source=distr_landing&utm_medium=custom_extensions_cpp