NeuroFlow - Discrete¶
Our first algorithm focuses on the discrete
action space implementation of NeuroFlow
(NF).
It builds on top of a Soft Actor-Critic (discrete) [] (SAC) base and combines a variety of well-known RL techniques with some of our own custom ones.
These include the following features:
- Small Actor and large Critic networks - from the paper: Honey, I Shrunk The Actor: A Case Study on Preserving Performance with Smaller Actors in Actor-Critic RL [].
- Differing Actor-Critic architectures - the Actor uses a
LiquidNCPNetwork
and the Critic's useNCPNetworks
. - Automatic Entropy Adjustment (Learned) - from the paper: Soft Actor-Critic Algorithms and Applications [].
Plus more, coming soon.
To build one, we use the NeuroFlow
class.
Building the Model¶
In it's simplest form, we can create one with just one line using three parameters:
Parameter | Description | Example |
---|---|---|
env_id |
The Gymnasium environment ID. | CartPole-v1 |
actor_neurons |
The number of decision/hidden nodes for the Actor network. | 20 or 40 |
critic_neurons |
The number of decision/hidden nodes for the Critic networks. We recommend this to be higher than the Actor network | 128 or 256 |
Python | |
---|---|
1 2 3 |
|
This code should work 'as is'.
Optional Parameters¶
This will create an instance of the model with the following default parameters:
Parameter | Description | Default |
---|---|---|
optim |
The PyTorch optimizer. | torch.optim.Adam |
buffer_size |
The ReplayBuffer size. |
1M |
actor_lr |
The actor optimizer learning rate. | 0.0003 |
critic_lr |
The critic optimizer learning rate. | 0.0003 |
alpha_lr |
The entropy optimizer learning rate. | 0.0003 |
initial_alpha |
The starting entropy coefficient. | 1.0 |
tau |
The soft update factor for slowly updating the target network weights. | 0.005 |
gamma |
The reward discount factor. | 0.99 |
device |
The device to perform computations on. E.g., cpu or cuda:0 . |
None |
seed |
The random generation seed for Python , PyTorch , NumPy and Gymnasium . When None , seed is automatically generated. |
None |
You can customize them freely using the required parameter name.
We strongly recommend that use the set_device
utility method before initializing the model to help with faster training times:
Python | |
---|---|
1 2 3 4 5 6 |
|
This code should work 'as is'.
NeuroFlow
uses the set_seed
utility method automatically when the model's seed=None
. This saves you having to manually create it first! 😉
Training the Model¶
Training the model is equally as simple! 😊
We just use the train()
method given a batch_size
:
Python | |
---|---|
1 2 3 4 5 6 7 |
|
This code should work 'as is'.
Optional Parameters¶
This will train the agent with the following default parameters:
Parameter | Description | Default |
---|---|---|
n_episodes |
The number of episodes to train for. | 10k |
callbacks |
A list of training callbacks applied during the training process. | None |
log_freq |
The metric logging frequency for offline and online analytics (in episodes). | 10 |
display_count |
The console training progress frequency (in episodes). | 100 |
window_size |
The reward moving average size (in episodes). | 100 |
max_steps |
The total number of steps per episode. | 1000 |
warmup_steps |
The number of samples to generate in the buffer before starting training. If None uses batch_size * 2 . |
None |
Like before, you can customize these freely using the required parameter name.
Making a Prediction¶
To make a new prediction, we need to pass in a environment state
and a hidden
state.
Python | |
---|---|
1 |
|
Optional Parameters¶
This will make a prediction with the following default parameters:
Parameter | Description | Default |
---|---|---|
train_mode |
A flag for swapping between deterministic and stochastic action predictions.
|
False |
Every prediction returns the action
prediction and the hidden
state.
If it's a one time prediction, hidden=None
is perfect, but you'll likely be using this in a real-time setting so you'll need to pass the hidden
state back into the next prediction and use a pre-wrapped environment (model.eval_env
).
Example¶
Here's a code example:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
This code should work 'as is'.
That covers the discrete
variant! Next, we'll look at the continuous
one. See you there! 👋