Actor Modules¶
Actor modules follow the Actor part of the Actor-Critic architecture. In NF's case, we follow a SAC base with Liquid NCP Networks, so the continuous
variant uses a Gaussian policy, and the discrete
variant a Categorical one.
The layout of the modules are identical but their underlying functionality differs to handle their respective use cases.
The only differences are the required init
parameters and the number of items returned by the forward
method.
Actor modules are a wrapper over the top of PyTorch functionality and are made up of the following components:
Attribute | Description | PyTorch Item |
---|---|---|
network |
The Actor network. | torch.nn.Module |
optim |
The Actor's optimizer. | torch.optim.Optimizer |
Discrete¶
For discrete
action spaces, we use the ActorModuleDiscrete
class.
This accepts the following parameters:
Parameter | Description | Default |
---|---|---|
state_dim |
The dimension of the state space. | - |
n_neurons |
The number of decision/hidden neurons. | - |
action_dim |
The dimension of the action space. | - |
optim |
The PyTorch optimizer. | torch.optim.Adam |
lr |
The optimizer learning rate. | 0.0003 |
device |
The device to perform computations on. E.g., cpu or cuda:0 . |
None |
Continuous¶
For continuous
action spaces, we use the ActorModule
class.
This accepts the following parameters:
Parameter | Description | Default |
---|---|---|
state_dim |
The dimension of the state space. | - |
n_neurons |
The number of decision/hidden neurons. | - |
action_dim |
The dimension of the action space. | - |
action_scale |
The scale factor to map the normalized actions to the environment's action range. | - |
action_bias |
The bias/offset to center the normalized actions to the environment's action range. | - |
log_std_min |
The minimum log standard deviation of the action distribution. | -5 |
log_std_max |
The maximum log standard deviation of the action distribution. | 2 |
optim |
The PyTorch optimizer. | torch.optim.Adam |
lr |
The optimizer learning rate. | 0.0003 |
device |
The device to perform computations on. E.g., cpu or cuda:0 . |
None |
Computing Scale Factors
The scale factors are designed to go from normalized actions back to the normal environment's action range. This is fundamental for SAC's training stability.
To calculate them, we use the following:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Updating Gradients¶
To perform a gradient update, we use the gradient_step
method:
Python | |
---|---|
1 |
|
Prediction¶
To make a prediction, we use the predict
method:
Python | |
---|---|
1 |
|
Forward Pass¶
For a complete network forward pass used during training, we use the forward
method:
Python | |
---|---|
1 2 3 4 5 |
|
Training vs. Evaluation Mode¶
To quickly swap between the networks training and evaluation mode we use the train_mode
and eval_mode
methods:
Python | |
---|---|
1 2 3 4 5 |
|
Next, we'll look at the critic
modules! 🚀