In the current project, my colleague and I were looking for an answer to the question of whether an advanced (Tabular) Monte Carlo method or function approximation can maximize the probability of winning in blackjack. All reinforcement learning algorithms presented here suggest convergence to locally optimal performance. It appears that the table Monte Carlo methods for infinite decks give the same average performance as the function approximation methods. The addition of parameters to the state space allows function approximation methods to learn more complex policies, which is important for playing with finite decks and obtaining reasonably good scores. Further analysis is needed to evaluate the performance of the agent trained here. The results show that learning converged to a locally optimal policy for all methods.
The paper from the project is available HERE