DdpgMlpQNet¶

class torchrl.modules.DdpgMlpQNet(mlp_net_kwargs_net1: dict | None = None, mlp_net_kwargs_net2: dict | None = None, device: DEVICE_TYPING | None = None)[source]¶

DDPG Q 值 MLP 類。

在“CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING”中提出，https://arxiv.org/pdf/1509.02971.pdf

DDPG Q 值網路接收觀察值和動作作為輸入，並從中返回一個標量。由於動作比觀察值整合得晚，因此建立了兩個網路。

引數：

mlp_net_kwargs_net1 (dict, 可選) –

MLP 的 kwargs。預設為

>>> {
...     'in_features': None,
...     'out_features': 400,
...     'depth': 0,
...     'num_cells': [],
...     'activation_class': nn.ELU,
...     'bias_last_layer': True,
...     'activate_last_layer': True,
...     }

mlp_net_kwargs_net2 –

預設為

>>> {
...     'in_features': None,
...     'out_features': 1,
...     'depth': 1,
...     'num_cells': [300, ],
...     'activation_class': nn.ELU,
...     'bias_last_layer': True,
... }

device (torch.device, 可選) – 建立模組的裝置。

示例

>>> import torch
>>> from torchrl.modules import DdpgMlpQNet
>>> net = DdpgMlpQNet()
>>> print(net)
DdpgMlpQNet(
  (mlp1): MLP(
    (0): LazyLinear(in_features=0, out_features=400, bias=True)
    (1): ELU(alpha=1.0)
  )
  (mlp2): MLP(
    (0): LazyLinear(in_features=0, out_features=300, bias=True)
    (1): ELU(alpha=1.0)
    (2): Linear(in_features=300, out_features=1, bias=True)
  )
)
>>> obs = torch.zeros(1, 32)
>>> action = torch.zeros(1, 4)
>>> value = net(obs, action)
>>> print(value.shape)
torch.Size([1, 1])

forward(observation: Tensor, action: Tensor) → Tensor[source]¶

定義每次呼叫時執行的計算。

應由所有子類覆蓋。

注意

雖然前向傳播 (forward pass) 的實現需要在本函式內定義，但之後應該呼叫 Module 例項而非直接呼叫此函式，因為前者會負責執行已註冊的鉤子 (hooks)，而後者則會默默忽略它們。

DdpgMlpQNet¶

文件

教程

資源