-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version of Frozen Lake based on the implementation #186
base: master
Are you sure you want to change the base?
Version of Frozen Lake based on the implementation #186
Conversation
from pythons gymnasium
Thank you for the pull request @LooseTerrifyingSpaceMonkey . Will review it by this weekend. |
Hi @LooseTerrifyingSpaceMonkey . Thanks for the PR. I cloned your branch and tried testing the environment. But couldn't get it to work. Had you tested the environment? I am getting the following error (This probably has something to do with line 33 where default value of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to mention this environment in the README.
I used some tool (maybe asciinema) to generate the GIFs in the README. But we can figure that out later.
Also, please make sure to test the code thoroughly this time 😅
Thanks.
changed sizing to include walls, updated README.md with description (still need images...), renamed OBSTACLE to HOLE, changed the hole character.
|
||
agent_position = CartesianIndex(2, 2) | ||
tile_map[AGENT, agent_position] = true | ||
if randomize_start_end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This randomization should also be done when resetting the environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a significant change from the gymnasium model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used the gymnasium one. Do you mean for every episode, the map remains the same?
I think it is valuable to have an option to be able to randomize the map on every episode to facilitate generalization. Most other environments in this package are like that. We could add a boolean to toggle it off if a user wants to keep the same map and adhere to the gym specification. Does that sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the map remains the same. From what I have seen it is used for q-learning and dynamic programming examples. If the map changed after every episode the agent wouldn't learn. I will set a boolean so the user has the option to turn on the resetting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide some guidance on unit testing with all these different flags? I don't understand how the settings are passed in runtests.jl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a boolean works.
Unfortunately, automatic unit testing all flag combinations for these games is a bit hard, and isn't implemented in runtests.jl
yet. So for now, you can just test the default settings for an environment in runtests.jl:
GridWorlds.jl/test/runtests.jl
Line 64 in 6c62515
env = GW.RLBaseEnv(Env(R = R)) |
GW.play!(env)
.
obstacle_position = GW.sample_empty_position(rng, tile_map) | ||
obstacle_positions[i] = obstacle_position | ||
if map_name == "" | ||
function get_neighbors(state::CartesianIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reasons of having these functions defined inside the constructor? If not, we can move them outside. Also, we will need to run A* when resetting the environment too. So we will be needing them there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because they are never used elsewhere. They won't be used when resetting the environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reopening it as per above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, just saying, I haven't seen a lot of code where functions are put inside other functions, even if they are not used elsewhere. In closures it could be useful when you want to capture other variable and such. But other than that, I haven't seen it happening much unless there is some specific reason. I may be wrong, but I think it is clearer to have them outside, unless there is a specific need, in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually see this quite a bit within Python code written by academics. I don't see it by people in industry. I don't know why. I can change it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Please go ahead with the change. Thanks.
|
||
The objective of the agent is to navigate its way to the goal while avoiding falling into the holes in the lake. When the agent reaches the goal, it receives a reward of 1 and the environment terminates. If the agent collides falls into a hole, the agent receives a reward of -1 and the environment terminates. The probablility of moving in the direction given by an agent is 1/3 while there is 1/3 chance to move in either perpendicular direction (for example: 1/3 chance to move up, 1/3 chance to move left and 1/3 chance to move right if the agent chose up). The scenario is based on the [Frozen Lake environment](https://gymnasium.farama.org/environments/toy_text/frozen_lake/) in Python's gymnasium. In the Python version there are two preset maps: "4x4" and "8x8". The GridWorlds implementation includes the walls as part of the dimensions, so the equivalent maps in GridWorlds is "6x6" and "10x10" respectively. The start, goal, and holes are located in the same positions in the lake as the Python version. If specifying custom height and widths keep in mind it is going to add walls all around the map so the actual surface of the lake is (height - 2, width - 2). | ||
|
||
<img src="https://user-images.githubusercontent.com/32610387/126910030-d93a714d-10b7-4117-887c-773afe78c625.png"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The screenshot and gif are from DynamicObstaclesUndirected
. We need to create new ones for FrozenLakeUndirected
.
We can do that in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct.
|
||
agent_position = CartesianIndex(2, 2) | ||
tile_map[AGENT, agent_position] = true | ||
if randomize_start_end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used the gymnasium one. Do you mean for every episode, the map remains the same?
I think it is valuable to have an option to be able to randomize the map on every episode to facilitate generalization. Most other environments in this package are like that. We could add a boolean to toggle it off if a user wants to keep the same map and adhere to the gym specification. Does that sound reasonable?
obstacle_position = GW.sample_empty_position(rng, tile_map) | ||
obstacle_positions[i] = obstacle_position | ||
if map_name == "" | ||
function get_neighbors(state::CartesianIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reopening it as per above
obstacle_position = GW.sample_empty_position(rng, tile_map) | ||
obstacle_positions[i] = obstacle_position | ||
if map_name == "" | ||
function get_neighbors(state::CartesianIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, just saying, I haven't seen a lot of code where functions are put inside other functions, even if they are not used elsewhere. In closures it could be useful when you want to capture other variable and such. But other than that, I haven't seen it happening much unless there is some specific reason. I may be wrong, but I think it is clearer to have them outside, unless there is a specific need, in my opinion.
from python's gymnasium.