NeuroFlow - Discrete¶
Our first algorithm focuses on the discrete action space implementation of NeuroFlow (NF).
It builds on top of a Soft Actor-Critic (discrete) [] (SAC) base and combines a variety of well-known RL techniques with some of our own custom ones.
These include the following features:
- Small Actor and large Critic networks - from the paper: Honey, I Shrunk The Actor: A Case Study on Preserving Performance with Smaller Actors in Actor-Critic RL [].
- Differing Actor-Critic architectures - the Actor uses a
LiquidNCPNetworkand the Critic's useNCPNetworks. - Automatic Entropy Adjustment (Learned) - from the paper: Soft Actor-Critic Algorithms and Applications [].
Plus more, coming soon.
To build one, we use the NeuroFlow class.
Building the Model¶
In it's simplest form, we can create one with just one line using three parameters:
| Parameter | Description | Example |
|---|---|---|
env_id |
The Gymnasium environment ID. | CartPole-v1 |
actor_neurons |
The number of decision/hidden nodes for the Actor network. | 20 or 40 |
critic_neurons |
The number of decision/hidden nodes for the Critic networks. We recommend this to be higher than the Actor network | 128 or 256 |
| Python | |
|---|---|
1 2 3 | |
This code should work 'as is'.
Optional Parameters¶
This will create an instance of the model with the following default parameters:
| Parameter | Description | Default |
|---|---|---|
optim |
The PyTorch optimizer. | torch.optim.Adam |
buffer_size |
The ReplayBuffer size. |
1M |
actor_lr |
The actor optimizer learning rate. | 0.0003 |
critic_lr |
The critic optimizer learning rate. | 0.0003 |
alpha_lr |
The entropy optimizer learning rate. | 0.0003 |
initial_alpha |
The starting entropy coefficient. | 1.0 |
tau |
The soft update factor for slowly updating the target network weights. | 0.005 |
gamma |
The reward discount factor. | 0.99 |
device |
The device to perform computations on. E.g., cpu or cuda:0. |
None |
seed |
The random generation seed for Python, PyTorch, NumPy and Gymnasium. When None, seed is automatically generated. |
None |
You can customize them freely using the required parameter name.
We strongly recommend that use the set_device utility method before initializing the model to help with faster training times:
| Python | |
|---|---|
1 2 3 4 5 6 | |
This code should work 'as is'.
NeuroFlow uses the set_seed utility method automatically when the model's seed=None. This saves you having to manually create it first! 😉
Training the Model¶
Training the model is equally as simple! 😊
We just use the train() method given a batch_size:
| Python | |
|---|---|
1 2 3 4 5 6 7 | |
This code should work 'as is'.
Optional Parameters¶
This will train the agent with the following default parameters:
| Parameter | Description | Default |
|---|---|---|
n_episodes |
The number of episodes to train for. | 10k |
callbacks |
A list of training callbacks applied during the training process. | None |
log_freq |
The metric logging frequency for offline and online analytics (in episodes). | 10 |
display_count |
The console training progress frequency (in episodes). | 100 |
window_size |
The reward moving average size (in episodes). | 100 |
max_steps |
The total number of steps per episode. | 1000 |
warmup_steps |
The number of samples to generate in the buffer before starting training. If None uses batch_size * 2. |
None |
Like before, you can customize these freely using the required parameter name.
Making a Prediction¶
To make a new prediction, we need to pass in a environment state and a hidden state.
| Python | |
|---|---|
1 | |
Optional Parameters¶
This will make a prediction with the following default parameters:
| Parameter | Description | Default |
|---|---|---|
train_mode |
A flag for swapping between deterministic and stochastic action predictions.
|
False |
Every prediction returns the action prediction and the hidden state.
If it's a one time prediction, hidden=None is perfect, but you'll likely be using this in a real-time setting so you'll need to pass the hidden state back into the next prediction and use a pre-wrapped environment (model.eval_env).
Example¶
Here's a code example:
| Python | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
This code should work 'as is'.
That covers the discrete variant! Next, we'll look at the continuous one. See you there! 👋