-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lucasborboleta #14
base: master
Are you sure you want to change the base?
Lucasborboleta #14
Conversation
"The default value for the exploration constant has been changed according to Wikipedia." Yes, I am confused by the exploration constant in the codes. Why default to 1/sqrt(2) then multiply by sqrt(2) in |
Yes, that was the purpose of my change : get it simple and standard, excepted if there is some added value ; especially, when the subject (MCTS) is somewhat new for newcomer like me. |
.gitignore
Outdated
*.pyc | ||
__pycache__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not to include generic .gitignore
s like this, in favour of each developer having their own global config, though I appreciate this view isn't shared by everyone. As such, I'd rather not include a .gitignore
unless it's for generated files specific to this repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point.
nodeValue = (node.state.getCurrentPlayer() * child.totalReward / child.numVisits + | ||
explorationValue * math.sqrt(math.log(node.numVisits) / child.numVisits)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
mcts.py
Outdated
|
||
def getStatistics(self, action=None): | ||
statistics = {} | ||
statistics['rootNumVisits'] = self.root.numVisits | ||
statistics['rootTotalReward'] = self.root.totalReward | ||
if action is not None: | ||
statistics['actionNumVisits'] = self.root.children[action].numVisits | ||
statistics['actionTotalReward'] = self.root.children[action].totalReward | ||
return statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of the purpose of this? While I can see the ability to extract extra statistics from the search tree could be useful, this is a relatively limited set of statistics that is being generated, so it seems very specific to one use case, and therefore maybe doesn't belong here, but instead in the calling code? I agree that there needs to be some documented way on the appropriate way of extracting extra statistics from the search tree though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can garantee the stability of the tree structure, either by direct access to the attributes or by an API, then, yes, I agree that the kind of statistics that I used can be located in the "client" side.
The purpose of my extracted statistics was : for large tree, regarding the allocation CPU time or round number, I was monitoring the way the action has been selected, from a few winning paths or if no winning paths have been found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'd rather hold off on adding this here then, instead offering a full API for extracting these details appropriately.
naughtsandcrosses.py
Outdated
@@ -4,54 +4,72 @@ | |||
from mcts import mcts | |||
from functools import reduce | |||
import operator | |||
import random | |||
|
|||
|
|||
class NaughtsAndCrossesState(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is deliberately designed to be a super simple example that users can easily eyeball and see what's going on, so I'd rather not complicate it like this. If you'd like to create an alternative search space and include both a more complex state and a more complex search (doing the full game instead of just one round) then I'd be happy to include that example too, but separately to this naughts and crosses one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point : providing a first super simple example. However, beside this first example, a second example that combines the complete API, like doing the full game, could also helps and attracts new users. Also varying the tree depth and permuting roles of the MCTS and random agents is instructive. Practically, I could propose : 1) naughtsandcrosses.py file kept unchanged ; 2) in another file, some generalization of Connect Four game with a grid size parameter, with code for the full game, in the way I did with NaughtsAndCrosses. Would you agree ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye, that sounds like a great idea, thanks!
README.md
Outdated
... | ||
searcher = mcts(timeLimit=1000) | ||
bestAction = searcher.search(initialState=currentState) | ||
currentState = currentState.takeAction(action) | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has now been made in #13
@@ -75,6 +75,7 @@ def selectNode(self, node): | |||
|
|||
def expand(self, node): | |||
actions = node.state.getPossibleActions() | |||
random.shuffle(actions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this shuffle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I experimented the full game of NaughtsAndCrosses extended to grid of large size, up to 10, I observed the following : allocating 1 second was not sufficient for the MCTS agent for winning against a random agent ; the game usually terminates as a draw. Printing the state of the game at each turn, I observed that the selected actions by MCTS on its successive turns are all located in the first rows. So MCTS behavior is somehow predictable when its exploration of the tree is not sufficient for finding at least a few winning paths. That is the reason for the added shuffle in the expand method. It could be interesting to quantify the gain of many (1000 ?) games : does the added shuffle increase or not the winned games of MCTS, when the tree is quite large ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I would argue that that deterministic behaviour on large search spaces is a symptom suggesting the search time needs to be significantly increased for the search to be effective, but I don't see any harm in putting this in here to avoid the ordering bias, aye.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my game project, nammed JERSI-CERTU, to address the above mentionned issue, but still using some official release of your package MCTS, I have just shuffle the possible actions on the client side. What is your opinion on the two solutions ? If you want your code to be simple, the suffle by the client is better.
README.md
Outdated
## Detailed usage | ||
A few customizations are possible through the `mcts` constructor: | ||
|
||
- The number of MCTS search rounds can be limited by either a given time limit or a given iteration number. | ||
- The exploration constant $c$, which appears in the UCT score $w_i/n_i + c\sqrt{{ln N_i}/n_i}$ with theoretical default setting $c=\sqrt 2$, can be adapted to your game. | ||
- The default uniform random rollout/playout policy can be changed. | ||
|
||
A few statistics can be retrieved after each MCTS search call (see `naughtsandcrosses.py` example) | ||
|
||
## Slow Usage | ||
//TODO | ||
More of MCTS theory could be found at https://en.wikipedia.org/wiki/Monte_Carlo_tree_search and cited references. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is useful, I'd still want to include more here before saying that the Detailed Usage / Slow Usage section is complete. Indeed, I feel like these notes would be useful included in the quick usage section too, so maybe pop them there instead and put the slow usage todo note back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could we simplify this to the following:
When initialising the MCTS searcher, there are a few optional parameters that can be used to optimise the search:
timeLimit
: the maximum duration of the search in milliseconds. Exactly one oftimeLimit
anditerationLimit
must be set.iterationLimit
: the maximum number of search iterations to be carried out. Exactly one oftimeLimit
anditerationLimit
must be set.explorationConstant
: a weight used when searching to help the algorithm prioritise between exploring unknown areas vs deeper exploring areas it currently believes to be valuable. The higher this constant, the more the algorithm will prioritise exploring unknown areas. Default value is √2.rolloutPolicy
: the policy to be used in the rollout phase when simulating one full playout. Default is a random uniform policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good synthesis for the quick usage 👍
Regarding the section "Slow usage", what do you mean by "slow" ? Do you intent "detailled information" ? Or have you in mind another mode for MCTS that indirectly slows down the search (like a different rolloutPolicy) ? Even if your material for this "slow usage" is not yet ready, could you either refine the title or put some hints in its TODO, if possible ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye, that was going to be for more detailed information on the whole system, so sure changing that header to "Detailed Information" would probably be clearer, go for it.
Thanks for your contribution! See comments above :) |
K-in-a-Row started ; missing diagonals
Almost good
Merge with master + README update to be done
Hello Paul, The last commit of my branch (same than the one used to create the current pull request) is implementing all the changes that we have discussed (I think so ; excepted a minor proposal in the README that has not been discussed yet). How can I transmit these changes to you ? Maybe you will take these changes from this "thread" ? |
getReward changed for returning 0 when the game terminates with a draw ; isTerminal changed for returning True or False when no winning case has been detected ; it does not change the way the code works, but it is cleaner regarding types ... and so it helps the reader, in my opinion.
Hello Paul, README.md has been changed according to our discussion. The naughtsandcrosses example has been changed back to be simple. However two minor detailled changes are proposed in two returns. Lucas |
Hello Paul,
Thanks for your MCTS implementation ! It helped me to understand the principles of this searching technique that I am trying to apply to the abstract game "jersi" (work in progress in Github repository "jersi-certu").
Summary of the pull-request:
Lucas