Working with Buffers¶
Buffers are a central piece for RL algorithms and are used heavily in our own implementations.
In Off-Policy agents we use a ReplayBuffer
and in On-Policy, a RolloutBuffer
.
Rollout Buffers
We've recently discontinued the RolloutBuffer
and removed it from the framework due to instability issues with LNNs and on-policy agents.
So, you'll only see the docs for the ReplayBuffer
here!
We have our own implementations of these that are easy to work with 😊.
Replay Buffer¶
To create a ReplayBuffer
, simply give it a capacity
, state_dim
, action_dim
, hidden_dim
and a torch.device
(optional):
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 |
|
Add One Item¶
To add an item, we use the add()
method with a set of experience from a Tuple
or the individual items:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Add Multiple Items¶
Or, we can add multiple values at once using the add_multi()
method. Like before, we can use a set of experience from a Tuple
or the individual items.
The only difference, is that everything must be a torch.Tensor
:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Get Samples¶
We can then sample()
a batch of experience:
Python | |
---|---|
1 |
|
This gives us a BatchExperience
object. We'll talk about this later.
Warning
We can only sample from the buffer after we have enough experience. This is dictated by your batch_size
.
Warming the Buffer¶
Since the ReplayBuffer
needs to have samples in it before we can sample
from it. We can use a warming process to pre-populate the buffer.
We have a dedicated method for this called warm()
that automatically gathers experience up to n_samples
without affecting your episode
count during training.
It requires two parameters:
Parameter | Description | Example |
---|---|---|
agent |
The Velora agent instance to generate samples with. | NeuroFlow |
n_samples |
The number of samples to generate. | batch_size * 2 |
And has one optional parameter:
Parameter | Description | Default |
---|---|---|
num_envs |
The number of vectorized environments to use for warming. | 8 |
Python | |
---|---|
1 |
|
Check Size¶
Lastly, we can check the current size
of the buffer:
Python | |
---|---|
1 |
|
Full Replay Example¶
Here's a complete example of the code we've just seen:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
This code should work 'as is'.
Saving and Loading Buffers¶
Sometimes you might want to reuse a buffers state in a different project. Well, now you can!
We provide both a save()
and load()
feature for all buffers 😎.
Saving¶
Once you've created a buffer and used it, simply pass in a dirpath
to the save
method. The final folder in the dirpath
will be used to store the buffer's state. This includes:
buffer_metadata.json
- the buffers metadatabuffer_state.safetensors
- the buffers tensor state
You can change the filename prefix buffer_
with the optional prefix
parameter:
Python | |
---|---|
1 2 3 4 5 6 7 8 |
|
- Optional
This code should work 'as is'.
Loading¶
Then, to restore it into a new buffer instance, we use the load()
method with the path to the safetensors
file and the preloaded metadata
:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
This code should work 'as is'.
BatchExperience¶
API Docs
Earlier, we mentioned the BatchExperience
object. This is a dataclass that stores ours experience as separate tensors and allows you to easily extract them using their attributes.
As mentioned, BatchExperience
is the one you get out of the buffer:
Python | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
All items in this class have the same shape (batch_size, features)
and are easily accessible through their attributes, such as batch.states
.
It's super convenient for doing calculations like this 😉:
Python | |
---|---|
1 |
|
Next, we're going to talk about accessing existing models Actor
and Critic
classes 👋.