Using Callbacks¶
API Docs
The normal process for training an agent is extremely limited. There is no ability to stop at a reward threshold or save a model's state.
Let's be honest, do you really want to sit through 100k episodes and then manually have to save your model, even though it solved the environment at say 10k episodes? I know I don't! 😅
Callbacks are a flexible and powerful way to change that.
When calling the train()
method you can use the callbacks
parameter to extend the functionality of the training process.
Basic Usage¶
For example, let's say we want to stop our agent when it achieves an average reward of 15
.
To do this, we use the EarlyStopping
callback:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 |
|
This code should work 'as is'.
Now, the model will automatically terminate when it reaches the reward target! It's as simple as that! 😊
Callback List¶
Combining Callbacks
Callbacks alone are extremely powerful for enhancing your training process, but it becomes even more ridiculous when you stack them together! E.g.,
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 |
|
We highly recommend you experiment with different callbacks
yourself and find what ones work best for you.
The possibilities are truly endless! 😎
There are a number of callbacks
available. Here's an exhaustive list of them:
EarlyStopping
- stops the training process when the average rewardtarget
is reached multiple times in a row based on thepatience
value.SaveCheckpoints
- saves the model state during the training process based on afrequency
.RecordVideos
- adds video recording to the agents training process.CometAnalytics
- adds Comet [] experiment cloud-based tracking.
Early Stopping¶
EarlyStopping
terminates the training process when the episodic reward target
is reached multiple times in a row based on the patience
value.
Python | |
---|---|
1 2 3 4 5 |
|
By default, the patience=3
and is the only optional parameter available.
Model Checkpoints¶
Sometimes it can be really useful to save the model state intermittently during the training process, especially when you are also using EarlyStopping
.
We can do this with the SaveCheckpoints
callback. It requires one parameter:
dirname
- the directory name to store the model checkpoints in thecheckpoints
directory.
Checkpoints are automatically added to a checkpoints
directory inside a <dirname>/saves
folder. This design choice compliments the RecordVideos
callback to help keep the experiments tidy.
For example, if we want to train a NeuroFlow
model and store its checkpoints in a model directory called nf
we'd use the following code:
Python | |
---|---|
1 2 3 4 5 |
|
Notice how we don't allow you to set a prefixed
name for checkpoints. It's set automatically with the environment name and episode count, such as:
InvertedPendulum_100/
InvertedPendulum_final/
We limit your control to the directory name to simplify the checkpoint process and to keep them organised.
Why only the dirname
?
Under the hood, we use the agent.save()
method for storing checkpoints (more on this later). It stores a variety of state files and two additional ones in the saves
folder - model_config.json
a file containing config details about the agent, and a completed.json
file after training terminates with final stats and duration (episodes
, steps
and time taken
).
That way, you don't need any complex dirnames
! 😉
Only setting the required parameters will create an instance of the callback with the following default optional
parameters:
frequency=100
- theepisode
save frequency.buffer=False
- whether to save the buffer state.
You can customize these freely using the required parameter name.
When buffer=True
the checkpoint state's buffer
is saved at that episode. We'll discuss more about this in the Saving and Loading Models section.
Recording Videos¶
Sometimes it's useful to see how the agent is performing while it is training. The best way to do this is visually, by watching the agent interact with its environment.
To do this, we use the RecordVideos
callback. Under-the-hood, we apply Gymnasium's RecordVideo [] wrapper to the environment with a minor expansion - it always records the final training episode.
It has one required parameter:
dirname
- the model directory name to store the videos. E.g.,nf
.
Videos are automatically added to a checkpoints
directory inside a <dirname>/videos
folder. This design choice compliments the SaveCheckpoints
callback to help keep the experiments tidy.
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
This code should work 'as is'.
For more control, you can also set any of the optional parameters:
method=episode
- the recording method. Either:episode
orstep
.frequency=100
- the record frequency formethod
.
Analytics¶
Experiment tracking is extremely important for understanding how an agent is performing. We offer two variants of this: offline
and online
(cloud-based).
Our offline approach works out of the box with every Velora agent. We'll talk about this more in the Training Metrics section.
However, online (cloud-based) tracking is optional. To integrate it we use callbacks! 😊
Comet Analytics¶
We've found Comet [] to be one of the best tools for RL experiments. It has a clean interface, an elegant category system for experiment details, and integrates well with video recordings.
To use it, we need 3 things -
-
The required dependencies:
1
pip install velora[comet]
-
A
COMET_API_KEY
(found in your Comet account settings API Key Docs []):You can either configure this using an
.env
file or setting it manually in the terminal -1
COMET_API_KEY=
1
export COMET_API_KEY=
1
set COMET_API_KEY=
1
$env:COMET_API_KEY="<insert_here>"
-
The dedicated callback -
CometAnalytics
:Python 1 2 3 4 5 6
from velora.callbacks import CometAnalytics callbacks = [ # other callbacks CometAnalytics("nf"), ]
The callback has one required parameter:
project_name
- the name of the Comet ML project to add the experiment to.
And three optional parameters:
-
experiment_name
- the name of the experiment. IfNone
, automatically creates the name using the format:<agent_classname>_<env_name>_<n_episodes>ep
.E.g.,
NeuroFlow_InvertedPendulum_1000ep
. -
tags
- a list of tags associated with experiment. IfNone
, sets tags automatically as:[agent_classname, env_name]
.
Note
We've limited the customization to keep things simple. By default, the experiment will be tied to your account associated to the COMET_API_KEY
.
You shouldn't have to tweak a million settings just to start tracking experiments! 🚀
We primarily focus on sending episodic metrics to Comet that provide a detailed overview of the training process. These include:
episode/return
- the raw episodic reward (return).episode/length
- the number of steps completed in the episode.reward/moving_avg
- the episodic reward moving average based on the trainingwindow_size
.reward/moving_upper
- the episodic reward moving upper bound (moving_avg + moving_std
) based on the trainingwindow_size
.reward/moving_lower
- the episodic reward moving lower bound (moving_avg - moving_std
) based on the trainingwindow_size
.losses/actor_loss
- the average Actor loss for each episode.losses/critic_loss
- the average Critic loss for each episode.losses/entropy_loss
- the average Entropy loss for each episode.
Next, we'll look at how to save
and load
models. See you there! 👋