Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility for parameter for how ratings and win chances adjust for uneven teams #130

Open
spookybear0 opened this issue Mar 9, 2024 · 16 comments
Labels
enhancement New feature or request

Comments

@spookybear0
Copy link

I'm using openskill for a game where sometimes we have teams of, for example, 6 vs 7. When making teams we put the better players on the team with the lesser amount of players. Openskill estimates are way off results when dealing with uneven teams. It seems that it values extra players much more than the specific game I'm using for it does.

Does anyone have any insight on how to tune a parameter that makes team disparity less important?

Thanks!

@vivekjoshy
Copy link
Owner

Duplicate #29

@spookybear0
Copy link
Author

It's not a duplicate, this issue is about teams of different amounts, not about player performance during a game.

@spookybear0
Copy link
Author

I decided to multiply the mu of each player on the team with fewer players by a variable factor (I used 1.1) for the win_percent function. It seems to be working fine.

@vivekjoshy
Copy link
Owner

Is your issue solved?

@spookybear0
Copy link
Author

Not exactly, I used a temporary fix that isn't very effective. If it's not possible (or no one is willing to implement it) this issue can be closed.

@vivekjoshy
Copy link
Owner

Openskill estimates are way off results when dealing with uneven teams. It seems that it values extra players much more than the specific game I'm using for it does.

Can you provide a reproducible example?

@philihp
Copy link
Contributor

philihp commented Apr 17, 2024

Yeah, a concrete example would be helpful here. If you don't like the result of one of the functions, then what should the result be, and why should it be that?

@spookybear0
Copy link
Author

I apologize, I've been a little busy. I'll get some examples ready. Thanks for the help.

@spookybear0
Copy link
Author

model = PlackettLuce()

async def get_games_with_unbalanced_teams() -> None:
    games = await SM5Game.filter(ranked=True).all()

    print("Getting win chances for unbalanced games. Close game defined as: difference <= 5000 points")

    for game in games:
        red_entity_starts = await game.get_team_entity_starts(Team.RED)
        green_entity_starts = await game.get_team_entity_starts(Team.GREEN)
        red_players = []
        green_players = []

        if not (abs(len(red_entity_starts) - len(green_entity_starts))) > 0:
            continue

        for player in red_entity_starts:
            red_players.append(await Player.filter(entity_id=player.entity_id).first())

        for player in green_entity_starts:
            green_players.append(await Player.filter(entity_id=player.entity_id).first())

        win_chance = get_win_chance(red_players, green_players) # wraps PlackettLuce.predict_win([team1, team2])
        
        score_diff = abs(await game.get_team_score(Team.RED) - await game.get_team_score(Team.GREEN))

        #if score_diff > 5000:
        #    continue

        print(
f"""Win chance for {game.id}: ({(win_chance[0]*100):.2f}%, {(win_chance[1]*100):.2f}%) \
red: {await game.get_team_score(Team.RED)}, \
green: {await game.get_team_score(Team.GREEN)}, \
difference: {score_diff}, \
close: {score_diff <= 5000}, \
team_lengths: {len(red_players)}, {len(green_players)}\
"""
)

Here's an example from my code grabbing all games with uneven teams and displaying their win chances, scores, and team sizes.

Output for close games only. The win percent chance is way off, even though these are only a few examples, even the games that weren't close still have wildly inaccurate win chances. It seems that when the team sizes change, it isn't able to predict the outcome anymore, though the amount each player adds to a team varies by game, so if a solution is implemented it should probably be one where the amount a player contributes to a team is variable (possibly exponentially).

Getting win chances for unbalanced games. Close game defined as: difference <= 5000 points
Win chance for 68: (5.29%, 94.71%) red: 35792, green: 34511, difference: 1281, close: True, team_lengths: 6, 7
Win chance for 114: (88.29%, 11.71%) red: 34134, green: 35412, difference: 1278, close: True, team_lengths: 7, 6
Win chance for 138: (10.62%, 89.38%) red: 31350, green: 27411, difference: 3939, close: True, team_lengths: 5, 6
Win chance for 139: (89.38%, 10.62%) red: 30512, green: 29690, difference: 822, close: True, team_lengths: 6, 5
Win chance for 142: (78.77%, 21.23%) red: 36393, green: 41052, difference: 4659, close: True, team_lengths: 7, 6

Here's an example that includes all games, not only the close ones.

Getting win chances for unbalanced games. Close game defined as: difference <= 5000 points
Win chance for 20: (11.24%, 88.76%) red: 43232, green: 33092, difference: 10140, close: False, team_lengths: 6, 7
Win chance for 21: (78.39%, 21.61%) red: 21950, green: 35752, difference: 13802, close: False, team_lengths: 7, 6
Win chance for 35: (15.78%, 84.22%) red: 37389, green: 27611, difference: 9778, close: False, team_lengths: 5, 6
Win chance for 39: (5.30%, 94.70%) red: 25529, green: 36191, difference: 10662, close: False, team_lengths: 5, 6
Win chance for 40: (5.30%, 94.70%) red: 35550, green: 21289, difference: 14261, close: False, team_lengths: 5, 6
Win chance for 49: (4.73%, 95.27%) red: 15169, green: 30812, difference: 15643, close: False, team_lengths: 5, 6
Win chance for 68: (5.29%, 94.71%) red: 35792, green: 34511, difference: 1281, close: True, team_lengths: 6, 7
Win chance for 84: (14.19%, 85.81%) red: 42752, green: 37593, difference: 5159, close: False, team_lengths: 6, 7
Win chance for 85: (16.37%, 83.63%) red: 36072, green: 23250, difference: 12822, close: False, team_lengths: 6, 7
Win chance for 86: (78.54%, 21.46%) red: 25251, green: 31810, difference: 6559, close: False, team_lengths: 6, 5
Win chance for 114: (88.29%, 11.71%) red: 34134, green: 35412, difference: 1278, close: True, team_lengths: 7, 6
Win chance for 128: (90.52%, 9.48%) red: 19208, green: 32490, difference: 13282, close: False, team_lengths: 6, 5
Win chance for 132: (95.74%, 4.26%) red: 48572, green: 31368, difference: 17204, close: False, team_lengths: 7, 6
Win chance for 138: (10.62%, 89.38%) red: 31350, green: 27411, difference: 3939, close: True, team_lengths: 5, 6
Win chance for 139: (89.38%, 10.62%) red: 30512, green: 29690, difference: 822, close: True, team_lengths: 6, 5
Win chance for 140: (94.89%, 5.11%) red: 32794, green: 18447, difference: 14347, close: False, team_lengths: 7, 6
Win chance for 141: (26.01%, 73.99%) red: 34812, green: 23032, difference: 11780, close: False, team_lengths: 6, 7
Win chance for 142: (78.77%, 21.23%) red: 36393, green: 41052, difference: 4659, close: True, team_lengths: 7, 6

My ratings are defined well with small sigma values due to amount of games and it has been proven to work well with even sized teams.

Let me know if more info is needed. Thanks guys.

@philihp
Copy link
Contributor

philihp commented Apr 17, 2024

The win percent chance is way off

What should the win percent chance be, and why should it be that and not anything else?

@spookybear0
Copy link
Author

Since these games were so close, the teams were more even and that should be reflected in the win chances. It should be much closer to 50:50 for games that were that close (although the score doesn't always reflect the win chance, it often does)

I've provided some control data below for what the win chances look like for close evenly matched teams. I can also provide data for evenly matched games that aren't close as well if that's needed. As you can see in the data, the win chances are much more reasonable for how close the score is (removing outliers), so it only makes sense for that to be the case for teams of uneven sizes. This is a pretty big difference compared to unevenly matched teams.

Getting win chances for balanced games. Close game defined as: difference <= 5000 points
Win chance for 10: (39.97%, 60.03%) red: 34892, green: 33791, difference: 1101, close: True, team_lengths: 7, 7
Win chance for 12: (60.03%, 39.97%) red: 37213, green: 35872, difference: 1341, close: True, team_lengths: 7, 7
Win chance for 13: (34.57%, 65.43%) red: 30909, green: 29929, difference: 980, close: True, team_lengths: 5, 5
Win chance for 19: (68.61%, 31.39%) red: 33131, green: 31551, difference: 1580, close: True, team_lengths: 6, 6
Win chance for 25: (62.64%, 37.36%) red: 28510, green: 26110, difference: 2400, close: True, team_lengths: 5, 5
Win chance for 30: (42.55%, 57.45%) red: 36732, green: 35091, difference: 1641, close: True, team_lengths: 6, 6
Win chance for 32: (55.20%, 44.80%) red: 30290, green: 30270, difference: 20, close: True, team_lengths: 6, 6
Win chance for 41: (28.73%, 71.27%) red: 30170, green: 33270, difference: 3100, close: True, team_lengths: 5, 5
Win chance for 47: (73.28%, 26.72%) red: 28110, green: 25130, difference: 2980, close: True, team_lengths: 5, 5
Win chance for 48: (56.88%, 43.12%) red: 25248, green: 27170, difference: 1922, close: True, team_lengths: 5, 5
Win chance for 53: (49.90%, 50.10%) red: 36651, green: 37532, difference: 881, close: True, team_lengths: 6, 6
Win chance for 55: (29.17%, 70.83%) red: 33772, green: 31232, difference: 2540, close: True, team_lengths: 6, 6
Win chance for 64: (68.01%, 31.99%) red: 36932, green: 34854, difference: 2078, close: True, team_lengths: 7, 7
Win chance for 76: (49.53%, 50.47%) red: 23510, green: 27610, difference: 4100, close: True, team_lengths: 5, 5
Win chance for 77: (47.55%, 52.45%) red: 37652, green: 33592, difference: 4060, close: True, team_lengths: 6, 6
Win chance for 92: (73.08%, 26.92%) red: 38894, green: 34652, difference: 4242, close: True, team_lengths: 7, 7
Win chance for 95: (44.52%, 55.48%) red: 40593, green: 40434, difference: 159, close: True, team_lengths: 7, 7
Win chance for 100: (47.35%, 52.65%) red: 40792, green: 36472, difference: 4320, close: True, team_lengths: 6, 6
Win chance for 107: (46.89%, 53.11%) red: 24510, green: 26009, difference: 1499, close: True, team_lengths: 5, 5
Win chance for 110: (47.75%, 52.25%) red: 37152, green: 35832, difference: 1320, close: True, team_lengths: 6, 6
Win chance for 111: (47.75%, 52.25%) red: 36692, green: 33312, difference: 3380, close: True, team_lengths: 6, 6
Win chance for 112: (62.95%, 37.05%) red: 28329, green: 24929, difference: 3400, close: True, team_lengths: 5, 5
Win chance for 113: (33.16%, 66.84%) red: 23689, green: 27289, difference: 3600, close: True, team_lengths: 5, 5
Win chance for 116: (50.05%, 49.95%) red: 40192, green: 38654, difference: 1538, close: True, team_lengths: 7, 7
Win chance for 123: (43.48%, 56.52%) red: 31290, green: 31012, difference: 278, close: True, team_lengths: 6, 6
Win chance for 126: (56.52%, 43.48%) red: 37651, green: 32849, difference: 4802, close: True, team_lengths: 6, 6
Win chance for 130: (61.16%, 38.84%) red: 28871, green: 31930, difference: 3059, close: True, team_lengths: 6, 6

Thanks again.

@philihp
Copy link
Contributor

philihp commented Apr 17, 2024

I can see what you're saying, but I feel like the best path forward here would be if you wrote your own predict_win function, and then we look at how we can generalize that with a parameter. It's all open source; there are no secrets, everything's there for you to fork 😅

@spookybear0
Copy link
Author

spookybear0 commented Apr 17, 2024

Of course, I just don't have any of the required experience in this field of math.

This is the solution I'm using now (which is definitely not mathematically supported but works somewhat decently). This is built towards my use case (since it only supports two teams). But it could(?) be a good place to start. I do want to point out that this isn't a great solution as it doesn't solve the problem entirely, though I'm sure there's a better solution than what I'm doing.

UNEVEN_TEAM_FACTOR = 0.09

class CustomPlackettLuce(PlackettLuce):
    def predict_win(self, teams: List[List[PlackettLuceRating]]) -> List[Union[int, float]]:
        # Check Arguments
        self._check_teams(teams)

        n = len(teams)

        # uneven team adjustment is only implemented for 2 teams

        # 2 Player Case
        if n == 2:
            # CUSTOM ADDITION
            if len(teams[0]) > len(teams[1]):
                logger.debug("Adjusting team ratings for uneven team count (team 1 has more players)")
                # team 1 has more players than team 2
                for player in teams[1]:
                    # multiply by 1 + 0.1 * the difference in player count
                    player.mu *= 1 + UNEVEN_TEAM_FACTOR * abs(len(teams[0]) - len(teams[1]))
            elif len(teams[0]) < len(teams[1]):
                logger.debug("Adjusting team ratings for uneven team count (team 2 has more players)")
                # team 2 has more players than team 1
                for player in teams[0]:
                    # multiply by 1 + 0.1 * the difference in player count
                    player.mu *= 1 + UNEVEN_TEAM_FACTOR * abs(len(teams[0]) - len(teams[1]))

            total_player_count = len(teams[0]) + len(teams[1])
            teams_ratings = self._calculate_team_ratings(teams)
            a = teams_ratings[0]
            b = teams_ratings[1]

            result = phi_major(
                (a.mu - b.mu)
                / math.sqrt(
                    total_player_count * self.beta**2
                    + a.sigma_squared
                    + b.sigma_squared
                )
            )

            return [result, 1 - result]

        return PlackettLuce.predict_win(self, teams)

@vivekjoshy
Copy link
Owner

Is this implementation able to actually predict the outcome of real matches data? I can see why in some games, teams with fewer players might win in let's say traditional games where an extra player counts. If you could provide some match data where the predict_win function is failing for close matches, then it would be very helpful. It would also aid in making a new parameter let's call it team_parity that allows the equality of uneven teams. Apriori, it seems feasible to do this. But in reality, we won't know until we test it on real data.

@spookybear0
Copy link
Author

spookybear0 commented Apr 18, 2024

The implementation in my comment before is shaky at predicting the outcome for uneven teams, that's why I'm looking for a better solution. It's more of a hacky fix than a real solution. The normal implementation, on the other hand, works great for matches of even teams but fails on games with uneven player amounts.

Here's additional data for close games with team size differences only including ones that failed to predict the winner, I don't have many instances of games with uneven teams so this doesn't have much data.

Here's the data with the CustomPlackettLuce class.

Win chance for 68: (24.69%, 75.31%) red: 35792, green: 34511, difference: 1281, close: True, team_lengths: 6, 7
Win chance for 114: (56.36%, 43.64%) red: 34134, green: 35412, difference: 1278, close: True, team_lengths: 7, 6
Win chance for 138: (38.55%, 61.45%) red: 31350, green: 27411, difference: 3939, close: True, team_lengths: 5, 6

And the same data using the normal PlackettLuce model

Win chance for 68: (5.29%, 94.71%) red: 35792, green: 34511, difference: 1281, close: True, team_lengths: 6, 7
Win chance for 114: (88.29%, 11.71%) red: 34134, green: 35412, difference: 1278, close: True, team_lengths: 7, 6
Win chance for 138: (10.62%, 89.38%) red: 31350, green: 27411, difference: 3939, close: True, team_lengths: 5, 6
Win chance for 142: (78.77%, 21.23%) red: 36393, green: 41052, difference: 4659, close: True, team_lengths: 7, 6

Plus here's the data for even teams to compare (is not affected by the custom model)
Even teams

Win chance for 10: (39.97%, 60.03%) red: 34892, green: 33791, difference: 1101, close: True, team_lengths: 7, 7
Win chance for 13: (34.57%, 65.43%) red: 30909, green: 29929, difference: 980, close: True, team_lengths: 5, 5
Win chance for 30: (42.55%, 57.45%) red: 36732, green: 35091, difference: 1641, close: True, team_lengths: 6, 6
Win chance for 48: (56.88%, 43.12%) red: 25248, green: 27170, difference: 1922, close: True, team_lengths: 5, 5
Win chance for 55: (29.17%, 70.83%) red: 33772, green: 31232, difference: 2540, close: True, team_lengths: 6, 6
Win chance for 77: (47.55%, 52.45%) red: 37652, green: 33592, difference: 4060, close: True, team_lengths: 6, 6
Win chance for 95: (44.52%, 55.48%) red: 40593, green: 40434, difference: 159, close: True, team_lengths: 7, 7
Win chance for 100: (47.35%, 52.65%) red: 40792, green: 36472, difference: 4320, close: True, team_lengths: 6, 6
Win chance for 110: (47.75%, 52.25%) red: 37152, green: 35832, difference: 1320, close: True, team_lengths: 6, 6
Win chance for 111: (47.75%, 52.25%) red: 36692, green: 33312, difference: 3380, close: True, team_lengths: 6, 6
Win chance for 123: (43.48%, 56.52%) red: 31290, green: 31012, difference: 278, close: True, team_lengths: 6, 6
Win chance for 130: (61.16%, 38.84%) red: 28871, green: 31930, difference: 3059, close: True, team_lengths: 6, 6

I have lots of data for this and I can provide more if needed (or if it needs to be displayed differently). I'm also available if you would like to test any additions on my codebase, I'd be happy to help out with that part.

@vivekjoshy
Copy link
Owner

A few instances of data are unfortunately not sufficient. We require match counts in the tens of thousands (the bigger the better) from real and actually played games to verify the effectiveness of such changes. To show that changes work, we use these open-source datasets to perform data analysis and measure performance. You can see the datasets used in /benchmark as a starting point.

@vivekjoshy vivekjoshy added the enhancement New feature or request label Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants