AI wins Rock Paper Scissors … almost!
Rock paper scissors — if you don’t know — is a hand game usually played with two players, in which each player must show one of these three shapes with a outstretched hand. These are “Rock”, “Paper” and “Scissors”.
Laying some ground rules.
The rules are pretty simple, rock beats scissors, scissors beat paper and, paper beats rock.
And on the off chance both players show up with the same “move”, it’s considered a draw. (a.k.a. no points are awarded)
Did you know — another popular name of rock paper scissors is “roshambo”, where, rock is referred to as “stone”.
What are my chances?
In a traditional setting with 2 human players, unless one or both players already know the trend of moves of the other player — if one exists — the game boils down to simple odds of thirds.
Odds of Winning with “Rock” — 33%
Odds of Winning with “Paper” — 33%
Odds of Winning with “Scissors” — 33%
Fun Fact — having a 1% advantage over 1000 games can lead to 10 more wins! which is in many cases, is enough to win a random and/or tied game.
What could be a good strategy to win?
Lets look at some strategy you can code to win a game.
We will consider a game of 1000 rounds in which winner takes all, i.e. having a +1 lead (501/499) will give the player a win. In the rare chance when we have a +0 lead (500/500) game, we will declare that game tied and restart with a new game.
The following section will look into various strategies for programming such a Agent.
Let’s start with a “Random Agent”.
We will refer to our AI programs as “agents”. A naïve approach for playing a random chance(-ish) game is having a random player … right?
Yes, but maybe no.
Why?
The problem in randomness is that we can never say for sure if the program is working as intended, as it’s results end up following a unpredictable random pattern, nullifying any sane verification process.
The advantage we get from a random agent is that, the opponent also faces the same difficulty as we do in identifying patterns. This results in the Random Agent winning around about 1/2 of the games we through at it, given it plays a lot of games!
This will be our default adversary for the next discussed strategies.
Then how do we beat Randomness?
In the pursuit of trying to find a competent agent, we will start with two basic assumptions.
A. No player is truly random, i.e. there are underlying patterns guiding a pseudo-random system.
B. Even when the player acts randomly, given a small subspace of moves (3 total), the % chance of each move must fall under some probability distribution.
Nash Equilibrium — into Game Theory.
Rock Paper Scissors is classified as an non-cooperative 2-player game. One of the strategy for such a game proposed by John Forbes Nash Jr. is, nesh equilibrium.
If each player has chosen a strategy for a game — basically a mini-plan for how to react to each move of the opponent given the current state of the game — and no player can increase it’s own odds of winning by changing it’s strategy, given the other player keeps their unchanged. Then the current set of strategy choices constitutes a Nash Equilibrium.
In lay-man terms, Nash’s idea is that no one can predict the moves of multiple players if one analyses those decisions in isolation.
It is observed that a Nash Agent preforms slightly better than a Random Agent.
Conditional Response — considering past actions.
Conditional Strategy can be summed up in two terms —
A. Win → Stay
B. Lose → Shift
It is observed that conditional strategy performs ~0.5–2% better than Nash Equilibrium.
Markov Chains — now with ML⭐
We can approach this problem though few-shot learning where we will use Random Agent for the first couple of turns and then, feed the results to a Markov Agent to predict future moves.
Markov Chains are a form of Generative AI which take in a seed — here, results from the random agent — and create a automata with weighted nodes, which are iterated after ever round of the game.
We will observe that for the first 100 rounds, the Markov agent is losing or on par with the adversary, but slowly start winning more rounds and gains lead over it by mid-game.
The Markov Agent wins with 5% lead compared to Nash Equilibrium.
Memory Patterns —Déjà vu ⭐⭐
Another strategy in an adversarial game is observing memory patterns, i.e. storing past moves played by the adversary and extracting patterns through meta-learning.
We want to identify the action plan of the adversary, by creating a lookup table of it’s past action — or simply, remembering it’s previous moves, given the current state of play.
The Memory Pattern Agent wins with 4% lead compared to Markov Agent.
Multi-Armed Bandit — Explore & Exploit ⭐⭐⭐
A good approach for N-Armed Problems such as Rock Paper Scissors is Multi-Armed Bandit model, which relies on the exploration of various moves (action-plans) and exploiting each to provide maximum wins, in a limited time frame (set of moves).
Exploitations in MAB models is similar to that of Markov Chains
Here, we aim to achieve zero-regret strategies, which converge to the optimal strategy, given a large amount of rounds — where regret refers to difference of the optimal strategy and the current model over N rounds.
The Multi-Armed Bandit Model wins with 5% lead over Memory Patterns.
(for detailed explanation check out — Wikipedia)
Decision Tree Classifier — YES? NO? ⭐⭐⭐
An alternative to few-shot learning can be DTCs.
Here we start with a random tree with n-parameters and after each round, tune it’s parameters until predictions converges to the optimal result.
We will start with a Random Agent for a couple of rounds, and feed it’s results to tune the DTC, with subsequent moves taken but the DTC agent — tuning along the way after each round.
Decision Tress are very inexpensive to make, and over 1000 rounds, it is observed that the strategy slowly converges to the optimal strategy.
Decision Tree Classifier Agent wins with 3% lead compared to Multi-Armed Bandits. It also wins with a 10% lead compared to Markov Agent.
So what’s best if I want to win?
Rock Paper Scissors turns out to be a very challenging regardless of whether you apply Machine Learning, or Game Theory — with each having there unique benefits.
As we have observed, it turns out that a simple 3 Armed Adversarial Game can have enough divergent strategies for it to be a difficult ML challenge.
Here is the full comparison of various agents, made by Yaroslav Isaienkov.
In the above heatmap — Red (+ lead/wins), Blue (- lead/loses) [coolwarm]
From this we observe that ML algorithms discussed such as Markov Agents, Memory Patterns, Multi-Armed Bandits and Decision Tree Classifiers get a significantly better score than statistical or random models; especially when matched against each other.
An interesting observation can be made that although AI models easily beat random and naïve models, strategies such as Nash Equilibrium (also discussed), Transition Matrix Method (not discussed) and Statistical Prediction (not discussed) can give AI models a run for their money when it comes to pure scores.
Even then, it is beneficial to note that Memory Patterns and Decision Trees perform exceptionally well and beat almost all other opponents in this particular adversarial game.
This was an Exploratory Analysis for the current Shoot! Rock, Paper, Scissors competition on Kaggle.
All the Best! and may the most competent win.🙂