QMixer¶
- class torchrl.modules.QMixer(state_shape: Union[Tuple[int, ...], Size], mixing_embed_dim: int, n_agents: int, device: Union[device, str, int])[source]¶
QMix 混合器。
透過一個單調的超網路將智慧體的區域性 Q 值混合成一個全域性 Q 值,其引數從全域性狀態獲取。引自論文 https://arxiv.org/abs/1803.11485 。
它將每個智慧體所選動作的區域性值(形狀為 (*B, self.n_agents, 1))轉換為一個全域性值(形狀為 (*B, 1))。與
torchrl.objectives.QMixerLoss一起使用。參閱 examples/multiagent/qmix_vdn.py 獲取示例。- 引數:
state_shape (tuple or torch.Size) – 狀態的形狀(不包含潛在的批處理維度)。
mixing_embed_dim (int) – 混合嵌入維度的大小。
n_agents (int) – 智慧體的數量。
device (str or torch.Device) – 用於網路的 PyTorch 裝置。
示例
>>> import torch >>> from tensordict import TensorDict >>> from tensordict.nn import TensorDictModule >>> from torchrl.modules.models.multiagent import QMixer >>> n_agents = 4 >>> qmix = TensorDictModule( ... module=QMixer( ... state_shape=(64, 64, 3), ... mixing_embed_dim=32, ... n_agents=n_agents, ... device="cpu", ... ), ... in_keys=[("agents", "chosen_action_value"), "state"], ... out_keys=["chosen_action_value"], ... ) >>> td = TensorDict({"agents": TensorDict({"chosen_action_value": torch.zeros(32, n_agents, 1)}, [32, n_agents]), "state": torch.zeros(32, 64, 64, 3)}, [32]) >>> td TensorDict( fields={ agents: TensorDict( fields={ chosen_action_value: Tensor(shape=torch.Size([32, 4, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 4]), device=None, is_shared=False), state: Tensor(shape=torch.Size([32, 64, 64, 3]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32]), device=None, is_shared=False) >>> vdn(td) TensorDict( fields={ agents: TensorDict( fields={ chosen_action_value: Tensor(shape=torch.Size([32, 4, 1]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32, 4]), device=None, is_shared=False), chosen_action_value: Tensor(shape=torch.Size([32, 1]), device=cpu, dtype=torch.float32, is_shared=False), state: Tensor(shape=torch.Size([32, 64, 64, 3]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([32]), device=None, is_shared=False)