-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get the current max priority in a table. #91
Comments
Hi Yicheng, Could you give us more details on how are you planning to sample? From what you describe, if the latest inserted is the one with higher priority, you may want to take a look at the Lifo selectors. Sabela. |
Hi Sabela, Thanks for the reply! I am basically trying to implement the exact PER scheme used in https://arxiv.org/abs/1511.05952. The dqn_zoo prioritized DQN agent does what I want. https://github.com/deepmind/dqn_zoo/blob/master/dqn_zoo/prioritized/agent.py However, compared with prioritized DQN in Acme, the Acme agent uses a default priority of 1.0 to add a new transition. See e.g. I believe that this is different from what is in the DQN zoo implementation, not sure if there is a practical difference in performance but I would hope to implement PER in a way that's as close to the original paper as possible. |
I see, so in some sense, the priority values is normalized such that the maximum priority would be 1.0, is that correct? |
Actually, not quite. Let's say if we use the td_error to update the priorities, then these values would not be normalized between 0.0 and 1.0. For example, consider an example where the td_error is 5.0, updating the priorities would result in the new priority to be 5.0. Using the priority of 1.0 would not ensure that the newly inserted item has a higher priority than the old samples with large td error. |
That would not be a problem if the td_error is capped between [-1, 1]. Do you have an example where it is not? I'm trying to verify on the code, but it may make sense to ask in the Acme repo as well (to make sure there is no issues with the DQN implementation). |
I don't think the DQN in acme clips the td_error. I know some Atari agents clip the max absolute reward to be between -1 and 1, but that doesn't mean the td_error is in anyway bounded. https://github.com/deepmind/acme/blob/master/acme/agents/jax/dqn/losses.py#L74 I have cross posted to dm-acme |
Hi Reverb team,
I am interested in using a prioritized experience replay on top of an Acme agent that inserts a new experience by setting the priority to the current maximum priority in the buffer. I have looked around but haven't found a good way to do this. Is there a recommended approach to do this in reverb?
Many thanks in advance!
The text was updated successfully, but these errors were encountered: