快捷方式

MultiAgentConvNet

class torchrl.modules.MultiAgentConvNet(n_agents: int, centralized: bool | None = None, share_params: bool | None = None, *, in_features: int | None = None, device: DEVICE_TYPING | None = None, num_cells: Sequence[int] | None = None, kernel_sizes: Union[Sequence[Union[int, Sequence[int]]], int] = 5, strides: Union[Sequence, int] = 2, paddings: Union[Sequence, int] = 0, activation_class: Type[nn.Module] = <class 'torch.nn.modules.activation.ELU'>, use_td_params: bool = True, **kwargs)[source]

多智慧體 CNN。

在多智慧體強化學習 (MARL) 設定中,智慧體可能會或可能不會為其行動共享相同的策略:我們稱引數可以共享或不共享。類似地,網路可以接受整個觀察空間(跨智慧體)或基於每個智慧體計算其輸出,我們分別將其稱為“集中式”和“非集中式”。

它期望輸入形狀為 (*B, n_agents, channels, x, y)

注意

要使用 torch.nn.init 模組初始化多智慧體強化學習 (MARL) 模組引數,請參閱 get_stateful_net()from_stateful_net() 方法。

引數:
  • n_agents (int) – 智慧體數量。

  • centralized (bool) – 如果為 True,每個智慧體將使用所有智慧體的輸入來計算其輸出,輸入的形狀為 (*B, n_agents * channels, x, y)。否則,每個智慧體將僅使用其自身資料作為輸入。

  • share_params (bool) – 如果為 True,將使用相同的 ConvNet 對所有智慧體進行前向傳播(同質策略)。否則,每個智慧體將使用不同的 ConvNet 處理其輸入(異質策略)。

關鍵字引數:
  • in_features (int, optional) – 輸入特徵維度。如果設定為 None,則使用惰性模組。

  • device (str or torch.device, optional) – 建立模組的裝置。

  • num_cells (int or Sequence[int], optional) – 輸入層和輸出層之間每一層的單元數量。如果提供一個整數,則每一層將具有相同數量的單元。如果提供一個可迭代物件,線性層的 out_features 將與 num_cells 的內容匹配。

  • kernel_sizes (int, Sequence[Union[int, Sequence[int]]]) – 卷積網路的核大小。預設為 5

  • strides (int or Sequence[int]) – 卷積網路的步長。如果為可迭代物件,其長度必須與深度匹配,深度由 num_cells 或 depth 引數定義。預設為 2

  • activation_class (Type[nn.Module]) – 要使用的啟用類。預設為 torch.nn.ELU

  • use_td_params (bool, optional) – 如果為 True,引數可以在 self.params 中找到,它是一個 TensorDictParams 物件(繼承自 TensorDictnn.Module)。如果為 False,引數包含在 self._empty_net 中。總的來說,這兩種方法應該大致相同但不可互換:例如,使用 use_td_params=True 建立的 state_dict 不能在 use_td_params=False 時使用。

  • **kwargs – 可以傳遞給 ConvNet 以自定義 ConvNet。

示例

>>> import torch
>>> from torchrl.modules import MultiAgentConvNet
>>> batch = (3,2)
>>> n_agents = 7
>>> channels, x, y = 3, 100, 100
>>> obs = torch.randn(*batch, n_agents, channels, x, y)
>>> # Let's consider a centralized network with shared parameters.
>>> cnn = MultiAgentConvNet(
...     n_agents,
...     centralized = True,
...     share_params = True
... )
>>> print(cnn)
MultiAgentConvNet(
    (agent_networks): ModuleList(
        (0): ConvNet(
        (0): LazyConv2d(0, 32, kernel_size=(5, 5), stride=(2, 2))
        (1): ELU(alpha=1.0)
        (2): Conv2d(32, 32, kernel_size=(5, 5), stride=(2, 2))
        (3): ELU(alpha=1.0)
        (4): Conv2d(32, 32, kernel_size=(5, 5), stride=(2, 2))
        (5): ELU(alpha=1.0)
        (6): SquashDims()
        )
    )
)
>>> result = cnn(obs)
>>> # The final dimension of the resulting tensor would be determined based on the layer definition arguments and the shape of input 'obs'.
>>> print(result.shape)
torch.Size([3, 2, 7, 2592])
>>> # Since both observations and parameters are shared, we expect all agents to have identical outputs (eg. for a value function)
>>> print(all(result[0,0,0] == result[0,0,1]))
True
>>> # Alternatively, a local network with parameter sharing (eg. decentralized weight sharing policy)
>>> cnn = MultiAgentConvNet(
...     n_agents,
...     centralized = False,
...     share_params = True
... )
>>> print(cnn)
MultiAgentConvNet(
    (agent_networks): ModuleList(
        (0): ConvNet(
        (0): Conv2d(4, 32, kernel_size=(5, 5), stride=(2, 2))
        (1): ELU(alpha=1.0)
        (2): Conv2d(32, 32, kernel_size=(5, 5), stride=(2, 2))
        (3): ELU(alpha=1.0)
        (4): Conv2d(32, 32, kernel_size=(5, 5), stride=(2, 2))
        (5): ELU(alpha=1.0)
        (6): SquashDims()
        )
    )
)
>>> print(result.shape)
torch.Size([3, 2, 7, 2592])
>>> # Parameters are shared but not observations, hence each agent has a different output.
>>> print(all(result[0,0,0] == result[0,0,1]))
False
>>> # Or multiple local networks identical in structure but with differing weights.
>>> cnn = MultiAgentConvNet(
...     n_agents,
...     centralized = False,
...     share_params = False
... )
>>> print(cnn)
MultiAgentConvNet(
    (agent_networks): ModuleList(
        (0-6): 7 x ConvNet(
        (0): Conv2d(4, 32, kernel_size=(5, 5), stride=(2, 2))
        (1): ELU(alpha=1.0)
        (2): Conv2d(32, 32, kernel_size=(5, 5), stride=(2, 2))
        (3): ELU(alpha=1.0)
        (4): Conv2d(32, 32, kernel_size=(5, 5), stride=(2, 2))
        (5): ELU(alpha=1.0)
        (6): SquashDims()
        )
    )
)
>>> print(result.shape)
torch.Size([3, 2, 7, 2592])
>>> print(all(result[0,0,0] == result[0,0,1]))
False
>>> # Or where inputs are shared but not parameters.
>>> cnn = MultiAgentConvNet(
...     n_agents,
...     centralized = True,
...     share_params = False
... )
>>> print(cnn)
MultiAgentConvNet(
    (agent_networks): ModuleList(
        (0-6): 7 x ConvNet(
        (0): Conv2d(28, 32, kernel_size=(5, 5), stride=(2, 2))
        (1): ELU(alpha=1.0)
        (2): Conv2d(32, 32, kernel_size=(5, 5), stride=(2, 2))
        (3): ELU(alpha=1.0)
        (4): Conv2d(32, 32, kernel_size=(5, 5), stride=(2, 2))
        (5): ELU(alpha=1.0)
        (6): SquashDims()
        )
    )
)
>>> print(result.shape)
torch.Size([3, 2, 7, 2592])
>>> print(all(result[0,0,0] == result[0,0,1]))
False

文件

訪問 PyTorch 的全面開發者文件

檢視文件

教程

獲取針對初學者和高階開發者的深入教程

檢視教程

資源

查詢開發資源並獲得問題解答

檢視資源