Skip to content
This repository has been archived by the owner on May 6, 2024. It is now read-only.

[NO MERGING] code release for NeurIPS 2020 #128

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

alexholdenmiller
Copy link
Member

This PR will be closed and is only here to highlight the code available within this branch.

Here we release updated code to get results competitive with our NeurIPS 2020 paper.

To be clear, this is not the exact code used for the paper: we made a number of performance improvements to NLE since the original results, dramatically increasing the speed of the environment (which was already one of the fastest-performing environments when the paper was published!).

We also introduced some additional modelling options, including conditioning the model on the in-game messages (i.e. msg.model=lt_cnn) and introducing new ways of observing the environment through different glyph types (i.e. glyph_type=all_cat). These features are enabled by default for the model now, which outperforms the models in the paper.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 13, 2020
@aleSuglia
Copy link

aleSuglia commented Nov 15, 2020

@alexholdenmiller Thanks a lot for releasing the code. This is very useful indeed. Quick question: what would be the best way to extend the evaluation with a different model? I currently have a new model for NetHack and I'm wondering what codebase I should be using for having a fair evaluation with your NeurIPS results. In a nutshell: if I want to claim that I'm SOTA on NetHack would it be enough to run (with my custom model) the nle/agent/neurips_sweep.sh and compare the results I get with the one in the paper?

I believe at this stage would be extremely useful to have something documented so that this is crystal clear. Thanks a lot!

@alexholdenmiller
Copy link
Member Author

alexholdenmiller commented Nov 16, 2020

@aleSuglia thanks for reaching out!

that's reasonable yes, though if you have the resources you may want to rerun the baseline as well due to a few changed params (or you can change them back) vs the paper. I'm rerunning experiments using these params here and will publish the results in the README in this folder.

here are the changed params, which I will also be adding to the README here:

  • disable reward clipping but enabled reward normalization (this has more consistent performance across tasks, and here I mean normalization ONLY to be dividing by the running stdev but NOT subtracting the mean--we preserve the meaning of positive vs negative reward in this environment)
  • increase hidden size 128 -> 256 and embedding size 32 -> 64
  • add a "message model" which feeds the ingame messages through a convolutional model (several other choices available)
  • instead of embedding each glyph in the observation with its unique ID (there are around 6000), creating an embedding based on different properties: the unique ID, the alphanumeric character used, the color, a special indicator, a group ID, and a sub-id within the group. this provides a more compositional meaning that the model can exploit (e.g. different monsters of the same type may have a similar character but different colours). we set it to "all_cat" in this config which concatenates sub-embeddings for each of these so they add up to 64 (i.e. 8 dim for group, 24 dim for id, 8 dim for color, 16 dim for character, 8 dim for special)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants