ProbabilisticActor¶
- class torchrl.modules.tensordict_module.ProbabilisticActor(*args, **kwargs)[來源]¶
RL 中機率型 actor 的通用類。
Actor 類帶有 out_keys ([“action”]) 的預設值,如果提供了 spec 但不是 Composite 物件,它將被自動轉換為
spec = Composite(action=spec)- 引數:
module (nn.Module) – 用於將輸入對映到輸出引數空間的
torch.nn.Module例項。in_keys (str 或 str 的可迭代物件 或 dict) – 將從輸入 TensorDict 中讀取並用於構建分佈的鍵。重要的是,如果它是一個字串的可迭代物件或一個字串,這些鍵必須與感興趣的分佈類使用的關鍵字匹配,例如 Normal 分佈的
"loc"和"scale"等。如果 in_keys 是一個字典,鍵是分佈的鍵,值是 tensordict 中將與對應分佈鍵匹配的鍵。out_keys (str 或 str 的可迭代物件) – 取樣值將被寫入的鍵。重要的是,如果在輸入 TensorDict 中找到這些鍵,取樣步驟將被跳過。
spec (TensorSpec, 可選) – 僅關鍵字引數,包含輸出張量的 spec。如果模組輸出多個張量,spec 描述第一個輸出張量的空間。
safe (bool) – 僅關鍵字引數。如果為
True,則對照輸入 spec 檢查輸出值。由於探索策略或數值下溢/上溢問題,可能會發生域外取樣。如果此值超出範圍,則使用TensorSpec.project方法將其投影回期望的空間。預設為False。default_interaction_type (tensordict.nn.InteractionType, 可選) –
僅關鍵字引數。用於檢索輸出值的預設方法。應為以下之一:
InteractionType.MODE、InteractionType.DETERMINISTIC、InteractionType.MEDIAN、InteractionType.MEAN或InteractionType.RANDOM(在這種情況下,值會從分佈中隨機取樣)。TorchRL 的ExplorationType類是InteractionType的代理。預設為InteractionType.DETERMINISTIC。注意
當抽取樣本時,
ProbabilisticActor例項將首先查詢由interaction_type()全域性函式指定的互動模式。如果此函式返回 None(其預設值),則將使用 ProbabilisticTDModule 例項的 default_interaction_type。請注意,DataCollectorBase例項預設會將 set_interaction_type 設定為tensordict.nn.InteractionType.RANDOM。distribution_class (Type, 可選) –
僅關鍵字引數。用於取樣的
torch.distributions.Distribution類。預設為tensordict.nn.distributions.Delta。注意
如果
distribution_class的型別為CompositeDistribution,則鍵將從該分佈的distribution_map/name_map關鍵字引數中推斷出來。如果此分佈與另一個建構函式(例如 partial 或 lambda 函式)一起使用,則需要顯式提供 out_keys。另請注意,動作將不會帶有"action"鍵字首,請參見下面的示例,瞭解如何使用ProbabilisticActor實現此功能。distribution_kwargs (dict, 可選) – 僅關鍵字引數。將傳遞給分佈的關鍵字引數對。
return_log_prob (bool, 可選) – 僅關鍵字引數。如果為
True,則分佈樣本的對數機率將以鍵 ‘sample_log_prob’ 寫入 tensordict 中。預設為False。cache_dist (bool, 可選) – 僅關鍵字引數。實驗性功能:如果為
True,則分佈的引數(即模組的輸出)將與樣本一起寫入 tensordict。這些引數可用於以後重新計算原始分佈(例如,計算用於取樣動作的分佈與 PPO 中更新的分佈之間的散度)。預設為False。n_empirical_estimate (int, 可選) – 僅關鍵字引數。當經驗均值不可用時,用於計算的樣本數。預設為 1000。
示例
>>> import torch >>> from tensordict import TensorDict >>> from tensordict.nn import TensorDictModule >>> from torchrl.data import Bounded >>> from torchrl.modules import ProbabilisticActor, NormalParamExtractor, TanhNormal >>> td = TensorDict({"observation": torch.randn(3, 4)}, [3,]) >>> action_spec = Bounded(shape=torch.Size([4]), ... low=-1, high=1) >>> module = nn.Sequential(torch.nn.Linear(4, 8), NormalParamExtractor()) >>> tensordict_module = TensorDictModule(module, in_keys=["observation"], out_keys=["loc", "scale"]) >>> td_module = ProbabilisticActor( ... module=tensordict_module, ... spec=action_spec, ... in_keys=["loc", "scale"], ... distribution_class=TanhNormal, ... ) >>> td = td_module(td) >>> td TensorDict( fields={ action: Tensor(shape=torch.Size([3, 4]), device=cpu, dtype=torch.float32, is_shared=False), loc: Tensor(shape=torch.Size([3, 4]), device=cpu, dtype=torch.float32, is_shared=False), observation: Tensor(shape=torch.Size([3, 4]), device=cpu, dtype=torch.float32, is_shared=False), scale: Tensor(shape=torch.Size([3, 4]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([3]), device=None, is_shared=False)
機率型 actor 也支援透過
tensordict.nn.CompositeDistribution類實現複合動作。此分佈以 tensordict 作為輸入(通常為 “params”)並將其作為一個整體讀取:此 tensordict 的內容是複合分佈中包含的各個分佈的輸入。示例
>>> from tensordict import TensorDict >>> from tensordict.nn import CompositeDistribution, TensorDictModule >>> from torchrl.modules import ProbabilisticActor >>> from torch import nn, distributions as d >>> import torch >>> >>> class Module(nn.Module): ... def forward(self, x): ... return x[..., :3], x[..., 3:6], x[..., 6:] >>> module = TensorDictModule(Module(), ... in_keys=["x"], ... out_keys=[("params", "normal", "loc"), ... ("params", "normal", "scale"), ... ("params", "categ", "logits")]) >>> actor = ProbabilisticActor(module, ... in_keys=["params"], ... distribution_class=CompositeDistribution, ... distribution_kwargs={"distribution_map": { ... "normal": d.Normal, "categ": d.Categorical}} ... ) >>> data = TensorDict({"x": torch.rand(10)}, []) >>> actor(data) TensorDict( fields={ categ: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int64, is_shared=False), normal: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False), params: TensorDict( fields={ categ: TensorDict( fields={ logits: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False), normal: TensorDict( fields={ loc: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False), scale: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False), x: Tensor(shape=torch.Size([10]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False)
使用複合分佈的機率型 actor 可以透過以下示例程式碼實現
示例
>>> import torch >>> from tensordict import TensorDict >>> from tensordict.nn import CompositeDistribution >>> from tensordict.nn import TensorDictModule >>> from torch import distributions as d >>> from torch import nn >>> >>> from torchrl.modules import ProbabilisticActor >>> >>> >>> class Module(nn.Module): ... def forward(self, x): ... return x[..., :3], x[..., 3:6], x[..., 6:] ... >>> >>> module = TensorDictModule(Module(), ... in_keys=["x"], ... out_keys=[ ... ("params", "normal", "loc"), ("params", "normal", "scale"), ("params", "categ", "logits") ... ]) >>> actor = ProbabilisticActor(module, ... in_keys=["params"], ... distribution_class=CompositeDistribution, ... distribution_kwargs={"distribution_map": {"normal": d.Normal, "categ": d.Categorical}, ... "name_map": {"normal": ("action", "normal"), ... "categ": ("action", "categ")}} ... ) >>> print(actor.out_keys) [('params', 'normal', 'loc'), ('params', 'normal', 'scale'), ('params', 'categ', 'logits'), ('action', 'normal'), ('action', 'categ')] >>> >>> data = TensorDict({"x": torch.rand(10)}, []) >>> module(data) >>> print(actor(data)) TensorDict( fields={ action: TensorDict( fields={ categ: Tensor(shape=torch.Size([]), device=cpu, dtype=torch.int64, is_shared=False), normal: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False), params: TensorDict( fields={ categ: TensorDict( fields={ logits: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False), normal: TensorDict( fields={ loc: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False), scale: Tensor(shape=torch.Size([3]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False), x: Tensor(shape=torch.Size([10]), device=cpu, dtype=torch.float32, is_shared=False)}, batch_size=torch.Size([]), device=None, is_shared=False)