Lucasborboleta #14

LucasBorboleta · 2021-01-02T11:07:36Z

Hello Paul,

Thanks for your MCTS implementation ! It helped me to understand the principles of this searching technique that I am trying to apply to the abstract game "jersi" (work in progress in Github repository "jersi-certu").

Summary of the pull-request:

Extended example NaughtsAndCrossesState: MCTS agent versus a random agent. The default 3x3 size of the grid is randomly selected up to 10x10. The player acting as MTCS is also randomly chosen.
The default value for the exploration constant has been changed according to wikipedia.
In the expand method the possible actions are shuffled prior to the node selection. Otherwise, as visible on a 10x10 NaughtsAndCrossesState, the expansion is sampling nodes in a too regular manner.

Lucas

WhymustIhaveaname · 2021-01-09T04:41:48Z

"The default value for the exploration constant has been changed according to Wikipedia." Yes, I am confused by the exploration constant in the codes. Why default to 1/sqrt(2) then multiply by sqrt(2) in getBestChild?

LucasBorboleta · 2021-01-09T11:44:55Z

"The default value for the exploration constant has been changed according to Wikipedia." Yes, I am confused by the exploration constant in the codes. Why default to 1/sqrt(2) then multiply by sqrt(2) in getBestChild?

Yes, that was the purpose of my change : get it simple and standard, excepted if there is some added value ; especially, when the subject (MCTS) is somewhat new for newcomer like me.

pbsinclair42 · 2021-01-09T16:38:02Z

.gitignore

+*.pyc
+__pycache__


I prefer not to include generic .gitignores like this, in favour of each developer having their own global config, though I appreciate this view isn't shared by everyone. As such, I'd rather not include a .gitignore unless it's for generated files specific to this repository.

I understand your point.

pbsinclair42 · 2021-01-09T16:46:24Z

mcts.py

+            nodeValue = (node.state.getCurrentPlayer() * child.totalReward / child.numVisits +
+                         explorationValue * math.sqrt(math.log(node.numVisits) / child.numVisits))


Good change 👍

pbsinclair42 · 2021-01-09T16:50:21Z

mcts.py

+
+    def getStatistics(self, action=None):
+        statistics = {}
+        statistics['rootNumVisits'] = self.root.numVisits
+        statistics['rootTotalReward'] = self.root.totalReward
+        if action is not None:
+            statistics['actionNumVisits'] = self.root.children[action].numVisits
+            statistics['actionTotalReward'] = self.root.children[action].totalReward
+        return statistics


I'm not sure of the purpose of this? While I can see the ability to extract extra statistics from the search tree could be useful, this is a relatively limited set of statistics that is being generated, so it seems very specific to one use case, and therefore maybe doesn't belong here, but instead in the calling code? I agree that there needs to be some documented way on the appropriate way of extracting extra statistics from the search tree though.

If you can garantee the stability of the tree structure, either by direct access to the attributes or by an API, then, yes, I agree that the kind of statistics that I used can be located in the "client" side.
The purpose of my extracted statistics was : for large tree, regarding the allocation CPU time or round number, I was monitoring the way the action has been selected, from a few winning paths or if no winning paths have been found.

Makes sense. I'd rather hold off on adding this here then, instead offering a full API for extracting these details appropriately.

pbsinclair42 · 2021-01-09T16:52:23Z

naughtsandcrosses.py

@@ -4,54 +4,72 @@
 from mcts import mcts
 from functools import reduce
 import operator
+import random


 class NaughtsAndCrossesState():


This is deliberately designed to be a super simple example that users can easily eyeball and see what's going on, so I'd rather not complicate it like this. If you'd like to create an alternative search space and include both a more complex state and a more complex search (doing the full game instead of just one round) then I'd be happy to include that example too, but separately to this naughts and crosses one.

I understand your point : providing a first super simple example. However, beside this first example, a second example that combines the complete API, like doing the full game, could also helps and attracts new users. Also varying the tree depth and permuting roles of the MCTS and random agents is instructive. Practically, I could propose : 1) naughtsandcrosses.py file kept unchanged ; 2) in another file, some generalization of Connect Four game with a grid size parameter, with code for the full game, in the way I did with NaughtsAndCrosses. Would you agree ?

Aye, that sounds like a great idea, thanks!

pbsinclair42 · 2021-01-09T16:53:26Z

README.md

+...
+searcher = mcts(timeLimit=1000)
+bestAction = searcher.search(initialState=currentState)
+currentState = currentState.takeAction(action)
+...


This change has now been made in #13

pbsinclair42 · 2021-01-09T16:56:11Z

mcts.py

@@ -75,6 +75,7 @@ def selectNode(self, node):

    def expand(self, node):
        actions = node.state.getPossibleActions()
+        random.shuffle(actions)


What's the purpose of this shuffle?

When I experimented the full game of NaughtsAndCrosses extended to grid of large size, up to 10, I observed the following : allocating 1 second was not sufficient for the MCTS agent for winning against a random agent ; the game usually terminates as a draw. Printing the state of the game at each turn, I observed that the selected actions by MCTS on its successive turns are all located in the first rows. So MCTS behavior is somehow predictable when its exploration of the tree is not sufficient for finding at least a few winning paths. That is the reason for the added shuffle in the expand method. It could be interesting to quantify the gain of many (1000 ?) games : does the added shuffle increase or not the winned games of MCTS, when the tree is quite large ?

Gotcha. I would argue that that deterministic behaviour on large search spaces is a symptom suggesting the search time needs to be significantly increased for the search to be effective, but I don't see any harm in putting this in here to avoid the ordering bias, aye.

On my game project, nammed JERSI-CERTU, to address the above mentionned issue, but still using some official release of your package MCTS, I have just shuffle the possible actions on the client side. What is your opinion on the two solutions ? If you want your code to be simple, the suffle by the client is better.

pbsinclair42 · 2021-01-09T17:13:41Z

README.md

+## Detailed usage
+A few customizations are possible through the `mcts` constructor:
+
+- The number of MCTS search rounds can be limited by either a given time limit or a given iteration number. 
+- The exploration constant $c$, which appears in the UCT score $w_i/n_i + c\sqrt{{ln N_i}/n_i}$ with theoretical default setting $c=\sqrt 2$, can be adapted to your game.
+- The default uniform random rollout/playout policy can be changed.
+
+A few statistics can be retrieved after each MCTS search call (see `naughtsandcrosses.py` example)

-## Slow Usage
-//TODO
+More of MCTS theory could be found at https://en.wikipedia.org/wiki/Monte_Carlo_tree_search and cited references.


While this is useful, I'd still want to include more here before saying that the Detailed Usage / Slow Usage section is complete. Indeed, I feel like these notes would be useful included in the quick usage section too, so maybe pop them there instead and put the slow usage todo note back?

Also, could we simplify this to the following:

When initialising the MCTS searcher, there are a few optional parameters that can be used to optimise the search:

timeLimit: the maximum duration of the search in milliseconds. Exactly one of timeLimit and iterationLimit must be set.

iterationLimit: the maximum number of search iterations to be carried out. Exactly one of timeLimit and iterationLimit must be set.

explorationConstant: a weight used when searching to help the algorithm prioritise between exploring unknown areas vs deeper exploring areas it currently believes to be valuable. The higher this constant, the more the algorithm will prioritise exploring unknown areas. Default value is √2.

rolloutPolicy: the policy to be used in the rollout phase when simulating one full playout. Default is a random uniform policy.

Very good synthesis for the quick usage 👍
Regarding the section "Slow usage", what do you mean by "slow" ? Do you intent "detailled information" ? Or have you in mind another mode for MCTS that indirectly slows down the search (like a different rolloutPolicy) ? Even if your material for this "slow usage" is not yet ready, could you either refine the title or put some hints in its TODO, if possible ?

Aye, that was going to be for more detailed information on the whole system, so sure changing that header to "Detailed Information" would probably be clearer, go for it.

pbsinclair42 · 2021-01-09T17:16:49Z

Thanks for your contribution! See comments above :)

K-in-a-Row started ; missing diagonals

Almost good

Merge with master + README update to be done

LucasBorboleta · 2021-01-12T15:25:34Z

Hello Paul,

The last commit of my branch (same than the one used to create the current pull request) is implementing all the changes that we have discussed (I think so ; excepted a minor proposal in the README that has not been discussed yet).

How can I transmit these changes to you ? Maybe you will take these changes from this "thread" ?
Sorry, I am still a newbie in Git and GitHub ...
Lucas

getReward changed for returning 0 when the game terminates with a draw ; isTerminal changed for returning True or False when no winning case has been detected ; it does not change the way the code works, but it is cleaner regarding types ... and so it helps the reader, in my opinion.

LucasBorboleta · 2021-01-20T07:00:32Z

Hello Paul,

README.md has been changed according to our discussion. The naughtsandcrosses example has been changed back to be simple. However two minor detailled changes are proposed in two returns.

Lucas

LucasBorboleta added 4 commits December 31, 2020 19:36

Improved example + improved expand method

c37fcd2

Impove game play example with MCTS statistics on the selected action

689d895

Update naughtsandcrosses.py

256af87

Contribution prepared

8c3d805

pbsinclair42 requested changes Jan 9, 2021

View reviewed changes

LucasBorboleta added 6 commits January 10, 2021 21:54

Create kinarow.py

b297d2b

K-in-a-Row started ; missing diagonals

Update kinarow.py

cfb0753

Almost good

Update kinarow.py

6a3483d

Merge with master + README update to be done

Polishing new example renamed connectmnk

e651677

Preparing pull request

f48311a

Delete .gitignore

50bea15

LucasBorboleta added 2 commits January 12, 2021 20:33

Update naughtsandcrosses.py

7f8e159

LucasBorboleta requested a review from pbsinclair42 January 17, 2021 10:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lucasborboleta #14

Lucasborboleta #14

LucasBorboleta commented Jan 2, 2021

WhymustIhaveaname commented Jan 9, 2021

LucasBorboleta commented Jan 9, 2021 •

edited

Loading

pbsinclair42 Jan 9, 2021

LucasBorboleta Jan 10, 2021

pbsinclair42 Jan 9, 2021

LucasBorboleta Jan 10, 2021

pbsinclair42 Jan 9, 2021

LucasBorboleta Jan 10, 2021

pbsinclair42 Jan 10, 2021

pbsinclair42 Jan 9, 2021

LucasBorboleta Jan 10, 2021 •

edited

Loading

pbsinclair42 Jan 10, 2021

pbsinclair42 Jan 9, 2021

pbsinclair42 Jan 9, 2021

LucasBorboleta Jan 10, 2021

pbsinclair42 Jan 10, 2021

LucasBorboleta Feb 7, 2021

pbsinclair42 Jan 9, 2021

pbsinclair42 Jan 9, 2021

LucasBorboleta Jan 10, 2021

pbsinclair42 Jan 10, 2021

pbsinclair42 commented Jan 9, 2021

LucasBorboleta commented Jan 12, 2021 •

edited

Loading

LucasBorboleta commented Jan 20, 2021

		nodeValue = (node.state.getCurrentPlayer() * child.totalReward / child.numVisits +
		explorationValue * math.sqrt(math.log(node.numVisits) / child.numVisits))

		*.pyc
		__pycache__

Lucasborboleta #14

Are you sure you want to change the base?

Lucasborboleta #14

Conversation

LucasBorboleta commented Jan 2, 2021

WhymustIhaveaname commented Jan 9, 2021

LucasBorboleta commented Jan 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LucasBorboleta Jan 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pbsinclair42 commented Jan 9, 2021

LucasBorboleta commented Jan 12, 2021 • edited Loading

LucasBorboleta commented Jan 20, 2021

LucasBorboleta commented Jan 9, 2021 •

edited

Loading

LucasBorboleta Jan 10, 2021 •

edited

Loading

LucasBorboleta commented Jan 12, 2021 •

edited

Loading