NeuroFlow - Continuous¶
This algorithm focuses on the continuous action space implementation of NeuroFlow (NF).
Discrete vs. Continuous
The API and docs are identical to the discrete variant with some slight differences to the init parameters and the class name. We now use NeuroFlowCT instead of NeuroFlow.
Feel free to jump to the part you need to save some time! 😀
It builds on top of a Soft Actor-Critic (continuous) [] (SAC) base and combines a variety of well-known RL techniques with some of our own custom ones.
These include the following features:
- Small Actor and large Critic networks - from the paper: Honey, I Shrunk The Actor: A Case Study on Preserving Performance with Smaller Actors in Actor-Critic RL [].
- Differing Actor-Critic architectures - the Actor uses a
LiquidNCPNetworkand the Critic's useNCPNetworks. - Automatic Entropy Adjustment (Learned) - from the paper: Soft Actor-Critic Algorithms and Applications [].
Plus more, coming soon.
Agent's Future
Cyber environments typically use discrete action spaces so this agent won't be used as heavily as the discrete variant.
As such, this may be discontinued and removed in a future release. Alternatively, we may still keep it but limit it to it's current features without any additional ones that are planned for the full algorithm.
To build one, we use the NeuroFlowCT class.
Building the Model¶
In it's simplest form, we can create one with just one line using three parameters:
| Parameter | Description | Example |
|---|---|---|
env_id |
The Gymnasium environment ID. | InvertedPendulum-v5 |
actor_neurons |
The number of decision/hidden nodes for the Actor network. | 20 or 40 |
critic_neurons |
The number of decision/hidden nodes for the Critic networks. We recommend this to be higher than the Actor network | 128 or 256 |
| Python | |
|---|---|
1 2 3 | |
This code should work 'as is'.
Optional Parameters¶
This will create an instance of the model with the following default parameters:
| Parameter | Description | Default |
|---|---|---|
optim |
The PyTorch optimizer. | torch.optim.Adam |
buffer_size |
The ReplayBuffer size. |
1M |
actor_lr |
The actor optimizer learning rate. | 0.0003 |
critic_lr |
The critic optimizer learning rate. | 0.0003 |
alpha_lr |
The entropy optimizer learning rate. | 0.0003 |
initial_alpha |
The starting entropy coefficient. | 1.0 |
log_std |
The (low, high) bounds for the log standard deviation of the action distribution. Used to control the variance of actions. |
(-5, 2) |
tau |
The soft update factor for slowly updating the target network weights. | 0.005 |
gamma |
The reward discount factor. | 0.99 |
device |
The device to perform computations on. E.g., cpu or cuda:0. |
None |
seed |
The random generation seed for Python, PyTorch, NumPy and Gymnasium. When None, seed is automatically generated. |
None |
You can customize them freely using the required parameter name.
We strongly recommend that use the set_device utility method before initializing the model to help with faster training times:
| Python | |
|---|---|
1 2 3 4 5 6 | |
This code should work 'as is'.
NeuroFlowCT uses the set_seed utility method automatically when the model's seed=None. This saves you having to manually create it first! 😉
Training the Model¶
Training the model is equally as simple! 😊
We just use the train() method given a batch_size:
| Python | |
|---|---|
1 2 3 4 5 6 7 | |
This code should work 'as is'.
Optional Parameters¶
This will train the agent with the following default parameters:
| Parameter | Description | Default |
|---|---|---|
n_episodes |
The number of episodes to train for. | 10k |
callbacks |
A list of training callbacks applied during the training process. | None |
log_freq |
The metric logging frequency for offline and online analytics (in episodes). | 10 |
display_count |
The console training progress frequency (in episodes). | 100 |
window_size |
The reward moving average size (in episodes). | 100 |
max_steps |
The total number of steps per episode. | 1000 |
warmup_steps |
The number of samples to generate in the buffer before starting training. If None uses batch_size * 2. |
None |
Like before, you can customize these freely using the required parameter name.
Making a Prediction¶
To make a new prediction, we need to pass in a environment state and a hidden state.
| Python | |
|---|---|
1 | |
Optional Parameters¶
This will make a prediction with the following default parameters:
| Parameter | Description | Default |
|---|---|---|
train_mode |
A flag for swapping between deterministic and stochastic action predictions.
|
False |
Every prediction returns the action prediction and the hidden state.
If it's a one time prediction, hidden=None is perfect, but you'll likely be using this in a real-time setting so you'll need to pass the hidden state back into the next prediction and use a pre-wrapped environment (model.eval_env).
Example¶
Here's a code example:
| Python | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
This code should work 'as is'.
Now, let's see how we can use our offline training metrics! 🚀