DdpgMlpQNet¶
- class torchrl.modules.DdpgMlpQNet(mlp_net_kwargs_net1: dict | None = None, mlp_net_kwargs_net2: dict | None = None, device: DEVICE_TYPING | None = None)[source]¶
DDPG Q 值 MLP 類。
在“CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING”中提出,https://arxiv.org/pdf/1509.02971.pdf
DDPG Q 值網路接收觀察值和動作作為輸入,並從中返回一個標量。由於動作比觀察值整合得晚,因此建立了兩個網路。
- 引數:
mlp_net_kwargs_net1 (dict, 可選) –
MLP 的 kwargs。預設為
>>> { ... 'in_features': None, ... 'out_features': 400, ... 'depth': 0, ... 'num_cells': [], ... 'activation_class': nn.ELU, ... 'bias_last_layer': True, ... 'activate_last_layer': True, ... }
mlp_net_kwargs_net2 –
預設為
>>> { ... 'in_features': None, ... 'out_features': 1, ... 'depth': 1, ... 'num_cells': [300, ], ... 'activation_class': nn.ELU, ... 'bias_last_layer': True, ... }
device (torch.device, 可選) – 建立模組的裝置。
示例
>>> import torch >>> from torchrl.modules import DdpgMlpQNet >>> net = DdpgMlpQNet() >>> print(net) DdpgMlpQNet( (mlp1): MLP( (0): LazyLinear(in_features=0, out_features=400, bias=True) (1): ELU(alpha=1.0) ) (mlp2): MLP( (0): LazyLinear(in_features=0, out_features=300, bias=True) (1): ELU(alpha=1.0) (2): Linear(in_features=300, out_features=1, bias=True) ) ) >>> obs = torch.zeros(1, 32) >>> action = torch.zeros(1, 4) >>> value = net(obs, action) >>> print(value.shape) torch.Size([1, 1])