Skip to content

Sample Genetic Algorithm Controller (2 Player)

Raluca D. Gaina edited this page Feb 12, 2018 · 2 revisions

The sample GA controller implements an online (rolling horizon) genetic algorithm to decide the next move to make. At every game step, a (new) small population of individuals is evolved during the 40 ms given. Each individual represents a sequence of actions, and its fitness is calculated with a heuristic that evaluates the state reached after applying all these actions. The move returned is the first action of the best individual found.

The opponent's action is always considered random.

The complete code for this agent is contained in a single class: Agent.java (although it uses a multi player version of a heuristic class, also in the framework: WinScoreHuristic.java). Here, we highlight some interesting parts of this agent.

First, the function that evaluates an individual, from Agent.java:

/**
 * StateObservationMulti stateObs: state of the game that can be copied multiple times and advanced with an array of actions of length N (N = number of player in the game), one action for each player (index in the array corresponding to the player's ID).
 * StateHeuristicMulti heuristic: heuristic used to evaluate a given state of the game.
 * int[] policy: genome of an individual of the GA, each value is an action.
 */
private double simulate(StateObservationMulti stateObs, StateHeuristicMulti heuristic, int[] policy) throws TimeoutException {

        //System.out.println("depth" + depth);
        long remaining = timer.remainingTimeMillis();
        if (remaining < BREAK_MS) {
            //System.out.println(remaining);
            throw new TimeoutException("Timeout");
        }

        int depth = 0;
        stateObs = stateObs.copy();
        for (; depth < policy.length; depth++) {
            Types.ACTIONS[] acts = new Types.ACTIONS[stateObs.getNoPlayers()];
            for (int i = 0; i < stateObs.getNoPlayers(); i++) {
                acts[i] = action_mapping[i].get(policy[depth]);
            }

            stateObs.advance(acts);

            if (stateObs.isGameOver()) {
                break;
            }
        }

        numSimulations++;
        double score = Math.pow(GAMMA, depth) * heuristic.evaluateState(stateObs, id);
        return score;
    }

The heuristic used in this algorithm is the multi player version of the WinScoreHeuristic, from WinScoreHuristic.java. If the game was won by this player / lost by the enemy or lost by this player / won by the enemy, it returns a huge positive (resp. negative) number as fitness. Otherwise, it returns the score of this player in that particular game state.

public class WinScoreHeuristic extends StateHeuristicMulti {

    private static final double HUGE_NEGATIVE = -1000.0;
    private static final double HUGE_POSITIVE =  1000.0;

    double initialNpcCounter = 0;

    public WinScoreHeuristic(StateObservationMulti stateObs) {}

    public double evaluateState(StateObservationMulti stateObs, int playerID) {
        boolean gameOver = stateObs.isGameOver();
        Types.WINNER win = stateObs.getMultiGameWinner()[playerID];
        Types.WINNER oppWin = stateObs.getMultiGameWinner()[(playerID + 1) % stateObs.getNoPlayers()];
        double rawScore = stateObs.getGameScore(playerID);

        if(gameOver && (win == Types.WINNER.PLAYER_LOSES || oppWin == Types.WINNER.PLAYER_WINS))
            return HUGE_NEGATIVE;

        if(gameOver && (win == Types.WINNER.PLAYER_WINS || oppWin == Types.WINNER.PLAYER_LOSES))
            return HUGE_POSITIVE;

        return rawScore;
    }
}

Table of Contents:

Clone this wiki locally