Working with Buffers¶
Buffers are a central piece for RL algorithms and are used heavily in our own implementations.
In Off-Policy agents we use a ReplayBuffer and in On-Policy, a RolloutBuffer.
Rollout Buffers
We've recently discontinued the RolloutBuffer and removed it from the framework due to instability issues with LNNs and on-policy agents.
So, you'll only see the docs for the ReplayBuffer here!
We have our own implementations of these that are easy to work with 😊.
Replay Buffer¶
To create a ReplayBuffer, simply give it a capacity, state_dim, action_dim, hidden_dim and a torch.device (optional):
| Python | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 | |
Add One Item¶
To add an item, we use the add() method with a set of experience from a Tuple or the individual items:
| Python | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
Add Multiple Items¶
Or, we can add multiple values at once using the add_multi() method. Like before, we can use a set of experience from a Tuple or the individual items.
The only difference, is that everything must be a torch.Tensor:
| Python | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
Get Samples¶
We can then sample() a batch of experience:
| Python | |
|---|---|
1 | |
This gives us a BatchExperience object. We'll talk about this later.
Warning
We can only sample from the buffer after we have enough experience. This is dictated by your batch_size.
Warming the Buffer¶
Since the ReplayBuffer needs to have samples in it before we can sample from it. We can use a warming process to pre-populate the buffer.
We have a dedicated method for this called warm() that automatically gathers experience up to n_samples without affecting your episode count during training.
It requires two parameters:
| Parameter | Description | Example |
|---|---|---|
agent |
The Velora agent instance to generate samples with. | NeuroFlow |
n_samples |
The number of samples to generate. | batch_size * 2 |
And has one optional parameter:
| Parameter | Description | Default |
|---|---|---|
num_envs |
The number of vectorized environments to use for warming. | 8 |
| Python | |
|---|---|
1 | |
Check Size¶
Lastly, we can check the current size of the buffer:
| Python | |
|---|---|
1 | |
Full Replay Example¶
Here's a complete example of the code we've just seen:
| Python | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | |
This code should work 'as is'.
Saving and Loading Buffers¶
Sometimes you might want to reuse a buffers state in a different project. Well, now you can!
We provide both a save() and load() feature for all buffers 😎.
Saving¶
Once you've created a buffer and used it, simply pass in a dirpath to the save method. The final folder in the dirpath will be used to store the buffer's state. This includes:
buffer_metadata.json- the buffers metadatabuffer_state.safetensors- the buffers tensor state
You can change the filename prefix buffer_ with the optional prefix parameter:
| Python | |
|---|---|
1 2 3 4 5 6 7 8 | |
- Optional
This code should work 'as is'.
Loading¶
Then, to restore it into a new buffer instance, we use the load() method with the path to the safetensors file and the preloaded metadata:
| Python | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
This code should work 'as is'.
BatchExperience¶
API Docs
Earlier, we mentioned the BatchExperience object. This is a dataclass that stores ours experience as separate tensors and allows you to easily extract them using their attributes.
As mentioned, BatchExperience is the one you get out of the buffer:
| Python | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
All items in this class have the same shape (batch_size, features) and are easily accessible through their attributes, such as batch.states.
It's super convenient for doing calculations like this 😉:
| Python | |
|---|---|
1 | |
Next, we're going to talk about accessing existing models Actor and Critic classes 👋.