NeuroFlow - Continuous¶
This algorithm focuses on the continuous
action space implementation of NeuroFlow
(NF).
Discrete vs. Continuous
The API and docs are identical to the discrete
variant with some slight differences to the init
parameters and the class name. We now use NeuroFlowCT
instead of NeuroFlow
.
Feel free to jump to the part you need to save some time! 😀
It builds on top of a Soft Actor-Critic (continuous) [] (SAC) base and combines a variety of well-known RL techniques with some of our own custom ones.
These include the following features:
- Small Actor and large Critic networks - from the paper: Honey, I Shrunk The Actor: A Case Study on Preserving Performance with Smaller Actors in Actor-Critic RL [].
- Differing Actor-Critic architectures - the Actor uses a
LiquidNCPNetwork
and the Critic's useNCPNetworks
. - Automatic Entropy Adjustment (Learned) - from the paper: Soft Actor-Critic Algorithms and Applications [].
Plus more, coming soon.
Agent's Future
Cyber environments typically use discrete
action spaces so this agent won't be used as heavily as the discrete
variant.
As such, this may be discontinued and removed in a future release. Alternatively, we may still keep it but limit it to it's current features without any additional ones that are planned for the full algorithm.
To build one, we use the NeuroFlowCT
class.
Building the Model¶
In it's simplest form, we can create one with just one line using three parameters:
Parameter | Description | Example |
---|---|---|
env_id |
The Gymnasium environment ID. | InvertedPendulum-v5 |
actor_neurons |
The number of decision/hidden nodes for the Actor network. | 20 or 40 |
critic_neurons |
The number of decision/hidden nodes for the Critic networks. We recommend this to be higher than the Actor network | 128 or 256 |
Python | |
---|---|
1 2 3 |
|
This code should work 'as is'.
Optional Parameters¶
This will create an instance of the model with the following default parameters:
Parameter | Description | Default |
---|---|---|
optim |
The PyTorch optimizer. | torch.optim.Adam |
buffer_size |
The ReplayBuffer size. |
1M |
actor_lr |
The actor optimizer learning rate. | 0.0003 |
critic_lr |
The critic optimizer learning rate. | 0.0003 |
alpha_lr |
The entropy optimizer learning rate. | 0.0003 |
initial_alpha |
The starting entropy coefficient. | 1.0 |
log_std |
The (low, high) bounds for the log standard deviation of the action distribution. Used to control the variance of actions. |
(-5, 2) |
tau |
The soft update factor for slowly updating the target network weights. | 0.005 |
gamma |
The reward discount factor. | 0.99 |
device |
The device to perform computations on. E.g., cpu or cuda:0 . |
None |
seed |
The random generation seed for Python , PyTorch , NumPy and Gymnasium . When None , seed is automatically generated. |
None |
You can customize them freely using the required parameter name.
We strongly recommend that use the set_device
utility method before initializing the model to help with faster training times:
Python | |
---|---|
1 2 3 4 5 6 |
|
This code should work 'as is'.
NeuroFlowCT
uses the set_seed
utility method automatically when the model's seed=None
. This saves you having to manually create it first! 😉
Training the Model¶
Training the model is equally as simple! 😊
We just use the train()
method given a batch_size
:
Python | |
---|---|
1 2 3 4 5 6 7 |
|
This code should work 'as is'.
Optional Parameters¶
This will train the agent with the following default parameters:
Parameter | Description | Default |
---|---|---|
n_episodes |
The number of episodes to train for. | 10k |
callbacks |
A list of training callbacks applied during the training process. | None |
log_freq |
The metric logging frequency for offline and online analytics (in episodes). | 10 |
display_count |
The console training progress frequency (in episodes). | 100 |
window_size |
The reward moving average size (in episodes). | 100 |
max_steps |
The total number of steps per episode. | 1000 |
warmup_steps |
The number of samples to generate in the buffer before starting training. If None uses batch_size * 2 . |
None |
Like before, you can customize these freely using the required parameter name.
Making a Prediction¶
To make a new prediction, we need to pass in a environment state
and a hidden
state.
Python | |
---|---|
1 |
|
Optional Parameters¶
This will make a prediction with the following default parameters:
Parameter | Description | Default |
---|---|---|
train_mode |
A flag for swapping between deterministic and stochastic action predictions.
|
False |
Every prediction returns the action
prediction and the hidden
state.
If it's a one time prediction, hidden=None
is perfect, but you'll likely be using this in a real-time setting so you'll need to pass the hidden
state back into the next prediction and use a pre-wrapped environment (model.eval_env
).
Example¶
Here's a code example:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
This code should work 'as is'.
Now, let's see how we can use our offline
training metrics! 🚀