Skip to content

NeuroFlow - Discrete

API Docs

velora.models.NeuroFlow(env_id, actor_neurons, critic_neurons)

Our first algorithm focuses on the discrete action space implementation of NeuroFlow (NF).

It builds on top of a Soft Actor-Critic (discrete) [] (SAC) base and combines a variety of well-known RL techniques with some of our own custom ones.

These include the following features:

Plus more, coming soon.

To build one, we use the NeuroFlow class.

Building the Model

In it's simplest form, we can create one with just one line using three parameters:

Parameter Description Example
env_id The Gymnasium environment ID. CartPole-v1
actor_neurons The number of decision/hidden nodes for the Actor network. 20 or 40
critic_neurons The number of decision/hidden nodes for the Critic networks. We recommend this to be higher than the Actor network 128 or 256
Python
1
2
3
from velora.models import NeuroFlow

model = NeuroFlow("InvertedPendulum-v5", 20, 128)

This code should work 'as is'.

Optional Parameters

This will create an instance of the model with the following default parameters:

Parameter Description Default
optim The PyTorch optimizer. torch.optim.Adam
buffer_size The ReplayBuffer size. 1M
actor_lr The actor optimizer learning rate. 0.0003
critic_lr The critic optimizer learning rate. 0.0003
alpha_lr The entropy optimizer learning rate. 0.0003
initial_alpha The starting entropy coefficient. 1.0
tau The soft update factor for slowly updating the target network weights. 0.005
gamma The reward discount factor. 0.99
device The device to perform computations on. E.g., cpu or cuda:0. None
seed The random generation seed for Python, PyTorch, NumPy and Gymnasium. When None, seed is automatically generated. None

You can customize them freely using the required parameter name.

We strongly recommend that use the set_device utility method before initializing the model to help with faster training times:

Python
1
2
3
4
5
6
from velora.models import NeuroFlow
from velora.utils import set_device

device = set_device()

model = NeuroFlow("InvertedPendulum-v5", 20, 128, device=device)

This code should work 'as is'.

NeuroFlow uses the set_seed utility method automatically when the model's seed=None. This saves you having to manually create it first! 😉

Training the Model

API Docs

velora.models.NeuroFlow.train(batch_size)

Training the model is equally as simple! 😊

We just use the train() method given a batch_size:

Python
1
2
3
4
5
6
7
from velora.models import NeuroFlow
from velora.utils import set_device

device = set_device()

model = NeuroFlow("InvertedPendulum-v5", 20, 128, device=device)
model.train(256)

This code should work 'as is'.

Optional Parameters

This will train the agent with the following default parameters:

Parameter Description Default
n_episodes The number of episodes to train for. 10k
callbacks A list of training callbacks applied during the training process. None
log_freq The metric logging frequency for offline and online analytics (in episodes). 10
display_count The console training progress frequency (in episodes). 100
window_size The reward moving average size (in episodes). 100
max_steps The total number of steps per episode. 1000
warmup_steps The number of samples to generate in the buffer before starting training. If None uses batch_size * 2. None

Like before, you can customize these freely using the required parameter name.

Making a Prediction

API Docs

velora.models.NeuroFlow.predict(state, hidden)

To make a new prediction, we need to pass in a environment state and a hidden state.

Python
1
action, hidden = model.predict(state, hidden)

Optional Parameters

This will make a prediction with the following default parameters:

Parameter Description Default
train_mode A flag for swapping between deterministic and stochastic action predictions.
  • When False - deterministic action predictions. Recommend for evaluating the model.
  • When True - stochastic action predictions. Required for training the model.
False

Every prediction returns the action prediction and the hidden state.

If it's a one time prediction, hidden=None is perfect, but you'll likely be using this in a real-time setting so you'll need to pass the hidden state back into the next prediction and use a pre-wrapped environment (model.eval_env).

Example

Here's a code example:

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from velora.models import NeuroFlow
from velora.utils import set_device

device = set_device()

model = NeuroFlow("InvertedPendulum-v5", 20, 128, device=device, seed=64)
model.train(128, n_episodes=100)

# Set prediction env
env = model.eval_env

# Run trained agent for 5 episodes
ep_total = 5
for i_ep in range(1, ep_total + 1):
    state, _ = env.reset()
    hidden = None

    while True:
        action, hidden = model.predict(state, hidden)
        state, reward, terminated, truncated, info = env.step(action)

        episode_over = terminated or truncated

        if episode_over:
            ep_return = info["episode"]["r"].item()
            print(f"Episode: {i_ep}/{ep_total}, Reward: {ep_return:.2f}")
            break

This code should work 'as is'.


That covers the discrete variant! Next, we'll look at the continuous one. See you there! 👋