Sunday, 28 February 2021

Calculating Nash equilibria

Let's return to the example of Rock Paper Scissors. Common sense tells us that choosing each option a third of the time should be the "best" strategy. Can we pin this down mathematically by proving it is the Nash equilibrium strategy?

Nash strategies are "unexploitable"

A key to calculating Nash strategies is the observation that they are unexploitable. Let's define a "least exploitable" strategy as any strategy which on average ties with or beats every other possible strategy, and remember the definition of a Nash strategy as one of a pair of strategies for which neither player can benefit by changing strategy without their opponent also changing. [Remember as well that we talk about a Nash strategy – rather than a pair of Nash strategies – because if \((A,B)\) denotes a pair of strategies forming a Nash equilibrium, then \( (A,A) \) also forms a Nash equilibrium, at least in the context of a symmetric zero-sum game.] These two concepts of an optimal strategy coincide.
Nash strategies are least exploitable strategies, and vice-versa, for two-player symmetric zero-sum games
See if you can explain why this is true; I'll give a rigorous explanation in the next post. Note that this result lets us prove a claim from the second post: that choosing randomly between Nash strategies gives another Nash strategy. It's easy to show this for least exploitable strategies, and those coincide with Nash strategies by the above. 

Verifying a strategy is Nash

By least exploitability, your average score when choosing Rock against me if I'm playing the Nash strategy is at most 0. This gives us the equation

\[\begin{split} &\quad\text{average score you get against my strategy when choosing Rock every time } \\ &\small{\left(= \left\{ \begin{split} &\text{your reward when I choose Rock}\times \text{proportion of the time I choose Rock} \\ +& \text{your reward when I choose Paper}\times \text{proportion of the time I choose Paper} \\ +&\text{your reward when I choose Scissors}\times \text{proportion of the time I choose Scissors}\end{split}\right\} \right)} \\ &\quad \text{is at most } 0.\end{split}\]

Remember the payoff matrix from last time, which gives the scores in each case:


Rock Paper Scissors
Rock 0 -1 1
Paper 1 0 -1
Scissors -1 1 0

Let's represent the proportions of each option which make up the Nash strategy using some letters so we can shorten the equation. Supposing the Nash strategy involves choosing Rock a proportion \(r\) of the time, Paper a proportion \(p\) of the time and Scissors a proportion \(s\) of the time, the above equation reduces to \begin{align*} 0r+1s+(-1)p&\leq 0, \text{ i.e.} \\ s-p &\leq 0. \end{align*} Similarly if I'm using the Nash strategy then I tie with or beat you if you choose Paper every time, or choose Scissors every time, which gives us the following equations: \[ r-s \leq 0, \quad p-r\leq 0. \] We can verify these hold with \(r=p=s=1/3\), confirming (as we already strongly suspected) that this is the Nash equilibrium! [As an aside, technically to show that a strategy is least exploitable we also need to show it doesn't lose to any mixture of Rock, Paper and Scissors. But this follows from it not losing to any of Rock, Paper or Scissors individually.] 

Calculating a Nash strategy when you don't know it in advance

What if we hadn't already known what the Nash strategy was, so we couldn't just check that it satisfied the equations? Without knowing the answer in advance or guessing it, we want a method to find \(r,s,p\) such that \[ s-p\leq 0,\quad r-s\leq 0,\quad p-r\leq 0.\] A trick we can do in this case is add up the three left sides: we get \(s-p+r-s+p-r=0.\) If we take three numbers, none of which is bigger than zero, add them together and get zero, it follows that none of them can have been negative. We therefore in fact have \[s-p=0,~r-s=0,~p-r=0,\] so that \(r=p=s.\) Since \(r+p+s=1\) (these are the only three options, so the proportions of the time we choose them add up to \(1\)) we see that \(r=p=s=1/3.\)

This hints at a general trick. By definition a least exploitable, and therefore Nash, strategy satisfies that the average payoff of each fixed opposing strategy is at most zero. But in the case of Rock-Paper-Scissors, in fact the payoffs were equal to zero. Equations with equals signs are much easier to solve than ones with less than or equals signs, so if we can always replace \(\leq \) by \(=\), it will help a lot with calculations. Can we do that? For many games the answer is yes.

Suppose a Nash strategy for a game is a completely mixed strategy, i.e. every option is chosen at least some proportion of the time. Then the reward for playing any fixed strategy against this Nash strategy (which is at most zero because Nash strategies are least exploitable) is in fact exactly equal to zero.

Why? Again, I'll leave this for you to think about for now, and give a justification in the next post. In the mean time, let's find the Nash strategy for Rock-Paper-SuperScissors, as promised. The equations we get by using that the payoff of each pure strategy against the Nash strategy is zero are: \[\text{Rock: }-p+s=0,\quad \text{Paper: }r-2s=0,\quad \text{SuperScissors: }-r+2p=0 \] The first two equations yield \(p=s=r/2\). Then, because the proportions must add to 1, the Nash strategy is \((p,s,r)=(1/4,1/4,1/2)\), i.e. you should choose Paper and SuperScissors a quarter of the time each, and Rock half the time. That's right: you choose scissors less than in normal RPS, despite the double reward!

The Nash strategy for cops and robbers

I found this result surprising the first time I saw it. We've shown that increasing the reward for winning with scissors, instead of encouraging you to pick scissors more often, should actually encourage you to choose rock more often. Intuitively speaking, this is to prevent your opponent from benefitting too much from the super-scissors. 

The same kind of calculations lead to some remarkable conclusions in modelling real world situations. Consider a simplified game theory model for crime, in which a thief wants to go out robbing if the police won't be patrolling that night, but doesn't want to go thieving if the police will be out and about; meanwhile, the police want to patrol if there'll be a thief out to catch but not otherwise. [Of course, like in Rock-Paper-Scissors, both the police and the thief must choose between their options without knowing what the other will do.] The same kind of calculations as for Rock-Paper-(Scissors/SuperScissors) above show that increasing the punishment for crime (which is like decreasing the reward for making the choice to commit a crime, if you get caught and hence "lose" the game) does not decrease rates of crime, but rather decreases the effort police go to to catch criminals: see the results here (in particular theorems 2 and 3). Similarly, increasing the rewards for catching a thief will decrease the crime rate, despite not increasing the proportion of the time that police spend patrolling.

This sounds insane! To get some intuition for why it can be true, think of changing rewards slowly and people reacting in stages. At first, increasing the reward for catching criminals encourages more patrolling, which makes it more likely that a robber gets caught. But if the police go out too often the robbers, who were willing to accept the old risk of being caught, will no longer go robbing because of the increased risk. At the old Nash equilibrium the risk and rewards were perfectly balanced, and since the jail setences and value of goods to steal haven't changed, the only way them to get balanced again is for the old rate of patrols to come back. So the higher rate of patrolling will die away, because otherwise there will never be robbers out for the police to catch, so they would never get to cash in on the bigger bonuses anyway. The extra pay isn't encouraging to the police to work harder, but it is at least encouraging them to work just as hard despite getting to catch crooks less often.

As a final point, let me emphasise that this is only a mathematical abstraction, and that what works in the model may not work as well in the real world. Regardless of whether it works in practice or not, pay based on numbers of arrests can lead to resentment among policed communities, who feel exploited and unfairly treated based on financial incentives, as the numerous articles complaining about "Collars for Dollars" show, even if these criticisms might not always be fair. The broader claim of the mathematical model – that increasing the probability of capture (for example by hiring more police) is much more effective at reducing crime than increasing the harshness of punishments – has been supported by evidence in practice, not just the theory.

Monday, 22 February 2021

Payoff matrices

How can we find the Nash equilibrium in a two-player, uncooperative, symmetric game? First it helps to have a more efficient way to describe games than the lengthy word explanations I gave in the first post. We can summarise a game through its payoff matrix: a table showing the different pairs of choices the players can make and the "payoffs" (scores) in each case.

Payoff matrices as an efficient way to describe a game

Let's explore payoff matrices, starting by considering Rock Paper Scissors. Here is a possible payoff matrix for the game.
 

Rock Paper Scissors
Rock 0,0 0,1 1,0
Paper 1,0 0,0 0,1
Scissors 0,1 1,0 0,0

The row headings give the action I choose, the column headings give the action you choose, and the entries say how many points I get (before the comma) and how many points you get (after the comma). There's a lot of redundant information in there. Why?

Remember that the game is symmetric, so knowing what I score when we choose (Rock, Paper) is enough to know what you get when we choose (Paper, Rock). This means we only need half of the entries: the bottom left entries can be worked out by swapping the points round from the matching top right entry. For example, the (Scissors, Rock) entry, highlighted in red in the table, is 0,1, and we could work this out by reversing the (Rock, Scissors) entry, highlighted in blue, of 1,0. Alternatively, we could keep all the entries in the table, but only include my score, and work out what your score is from that.
 
Let's take the second of these options. Let's also make another slight change. I've been referring to Rock Paper Scissors as a "zero-sum game" because we can't both win or both lose at the same time: the actual number of points scored isn't so important as the difference between my number of wins and your number of wins. To really be a zero-sum game, we need to record this difference rather than the original scores, which yields the following table.

Rock Paper Scissors
Rock 0 -1 1
Paper 1 0 -1
Scissors -1 1 0

With this change we can summarise the payoffs even more efficiently. Suppose we only saw part of the table above:


Rock Paper Scissors
Rock ? -1 1
Paper ?? ? -1
Scissors ?? ?? ?

For each of the red question marks, the two players have chosen the same option, and necessarily draw because the game is symmetric, hence must both score zero points.

Let's consider the blue double question mark for the (Scissors, Rock) entry, in the bottom left. When I play Rock vs your Scissors, I score 1 point, so when you play Scissors vs my Rock you must likewise score 1 point. Because it's a zero-sum game, that means in the latter scenario I must score -1 point, and we can fill this in the bottom left corner.

We can work out the other double question mark entries similarly, so we've been able to work out the full table from only the entries in the top right corner. I'll continue to write the tables in full, because it will be convenient for calculations.

Some more payoff matrices

To demonstrate how efficient these tables are for describing games, here are the three other variants on rock paper scissors I needed whole paragraphs to explain in the first post. 


Rock Paper Scissors Well
Rock 0 -1 1 -1
Paper 1 0 -1 1
Scissors -1 1 0 -1
Well 1 -1 1 0


Rock Paper Papyrus Scissors
Rock 0 -1 -1 1
Paper 1 0 0 -1
Papyrus 1 0 0 -1
Scissors -1 1 1 0


Rock Paper SuperScissors
Rock 0 -1 1
Paper 1 0 -2
SuperScissors -1 2 0

Payoff matrices allow you to very quickly spot certain dominated strategies and duplicate strategies. In particular, we can see that in the first example that the row entries for Rock are all smaller than or the same as those for Well, which entails that Rock is dominated by Well. Meanwhile, in the second table, the entries in the Paper row and exactly the same as those for the Papyrus row (and similarly for their columns). 

Exercise

See if you can find the omitted entries of this payoff matrix. Do you think there's a dominated strategy?

Option 1 Option 2 Option 3 Option 4
Option 1 ? 3 -2 1
Option 2 ? ? 0 6
Option 3 ? ? ? -4
Option 4 ? ? ? ?

Arbitrage betting

We can't always spot duplicate or dominated strategies by directly comparing table rows: as in the last post, sometimes a mixture of strategies can mimic or outperform a strategy (or mimic a different mixture of strategies). The fact that there are multiple ways to make the same bet is something familiar to lots of gamblers: for example, you could:

  • Bet directly that Chelsea win
  • Make two bets simultaneously:
    • That both Chelsea and Aston Villa win: 
    • That Chelsea win but Aston Villa lose or draw.
It may be that one of these betting methods gives you better odds for Chelsea winning, making than the other a dominated strategy. Sometimes the division can be more complicated than this: I remember exploring Paris with a friend who had to pause to place a bet that Southampton's would win the second half of their match, having previously made bets regarding the overall result and the first half. Finding ways to divide up bets like this can be used to great profit: sometimes you can find a sure win, throguh what is known as arbitrage betting.