Kweku's blog

Sunday, 21 March 2021

Don't play Nash if you want to win

Nash strategies are the best possible strategies in some ways...

Last time it was left as an exercise to prove the following.

Nash strategies are least exploitable strategies, and vice-versa, for two-player symmetric zero-sum games

This gives a sense in which Nash strategies are optimal. Remember a least exploitable strategy is defined as any strategy which on average ties with or beats every other possible strategy. What that means is that even if I tell you that I'm going to use the Nash strategy, you still can't beat me. That obviously isn't the case for most strategies: if I told you I was going to use Rock every time, you could choose Paper every time to win.

To justify the claim, suppose \(A\) denotes a Nash strategy and \(B\) denotes some other strategy. The average score of the second player when playing \(B\) is not an improvement on their average score when playing \(A\), since \((A,A)\) is a Nash equilibrium so they can't benefit from unilaterally switching. Because the game is symmetric and zero-sum, the average score for each player when they both play \(A\) is zero. This shows that a Nash strategy is least exploitable.

We still need to prove what mathematicians call the converse: that any least exploitable strategy is Nash. If \(A\) is a least exploitable strategy, the average score for each player when both play \(A\) is zero. The average score one player gets when unilaterally switching is at most zero because \(A\) is least exploitable, and this proves that the pair of strategies \((A,A)\) forms a Nash equilibrium.

We can prove another sense in which Nash strategies are optimal. Remember that a "dominated" strategy was one which is outperformed by one of the options (or a mixture of the options), like Rock in the game Rock-Paper-Scissors-Well (because Well matches the result of Rock against Paper and Scissors, but whereas Rock ties with itself, Well beats Rock).

If you play a Nash strategy and I play a strategy in which I sometimes choose a dominiated option, you win on average.

I'll prove it in the example case Rock-Paper-Scissors-Well to have names for everything, but the proof works in general. The dominated strategy is Rock, and we're assuming I choose it at least some of the time. Suppose you, following the Nash strategy, also play against my sister, and she makes the exact same moves as me except that every time I choose Rock she chooses Well. Then at best she draws with you (because you're playing a Nash strategy) and she outperforms me (because every time I choose Rock and you choose Well, I lose but she draws; note this happens at least sometimes). So I must lose to you on average.

[Actually, strictly speaking the proof I've given isn't quite valid, and indeed the "theorem" isn't true. Perhaps you remember, all the way back in post 2, where we considered a game of Rock Paper Papyrus Scissors and then added something which, say, loses to Paper and ties with everything else – let's call this option Fail. Choosing Rock, Papyrus and Scissors a third of the time each remains a Nash strategy in this game and Fail is a dominated strategy (see if you can prove these claims). But none of Rock, Papyrus or Scissors beats Fail, hence nor does the Nash strategy obtained by mixing them. Papyrus itself is technically a dominated strategy in this game – it's outperformed by Paper – despite appearing in a Nash equilibrium, which is the cause of this trouble. To avoid this one should restrict attention to admissible Nash strategies, which are ones excluding dominated strategies.]

In some games, such as La Relance (see section 7.1 here), a two-player poker game which can be viewed as zero-sum (but not symmetric), the non-dominated strategies maybe be complex enough (bet when your hand is higher than some precise threshold and otherwise don't) that a typical opponent can be expected to err occasionally, yielding a guaranteed (average) profit by following an (admissible) Nash strategy.

But using Nash strategies isn't always the best idea

Using a mixture Rock, Papyrus and Scissors is silly in the game of Rock-Paper-Papyrus-Scissors-Fail, despite being Nash, because using Paper instead of Papyrus gives better outcomes. Remember also the example of choosing a lane to overtake in, from all the way back in the first post? To recap, if both the slow driver in front and the fast driver behind choose the inside lane, they crash; if they both choose the outside lane they crash; but if one chooses the inside and one the outside then the overtake happens safely for both. One of the three Nash equilibria for this "game" was each driver choosing lane by flipping a coin: if I know you're choosing by flipping a coin, I can't have a better than 50-50 chance of avoiding a crash no matter how I make my choice. This last Nash strategy is clearly not a good idea, because we can avoid crashes much more than the 50% of the time it yields: indeed, we can avoid crashing all the time, just by choosing different lanes.

These are surface-level problems which can easily be avoided: in each case, while there is a "bad" Nash equilbrium, there are also better ones, so we don't have to give up on choosing a Nash strategy to overcome this "badness". However, the problems with using Nash strategies go deeper. Remember that last time we had the following claim:

Suppose a Nash strategy for a game is a completely mixed strategy, i.e. every option is chosen at least some proportion of the time. Then the reward for playing any fixed option against this Nash strategy, which by definition is at most zero, is in fact exactly equal to zero.

To prove this, suppose there's some option which loses to the Nash strategy, rather than tying. If we're both playing the completely mixed Nash strategy and you switch to using a strategy which never chooses this losing option but picks all the other options with the same relative proportions then you'll beat me: when I don't choose the losing strategy, we're making the same average choices and hence we tie, and when I do choose it I lose on average. The fact you can improve your payoff by unilaterally changing strategy means my strategy couldn't have been Nash in the first place.

We've previously seen (thanks to equation (8) here) that for many symmetric zero-sum games, the Nash strategy is completely mixed. The result we've just proved then means that by playing Nash, you'll never win against your opponent, no matter how bad they are. Imagine playing Rock-Paper-Scissors against someone who chooses Rock every time: by rigidly sticking to the Nash strategy, you let them win and draw \(1/3\) of the time each, where you could win every time by choosing Paper.

When you can work out your opponent's strategy, your best response should take that into account, in contrast to a least exploitable Nash strategy which has to do ok regardless of what they're doing. By making yourself more vulnerable to some strategies (which they aren't using), you can potentially perform better against the strategy they are using. In the example choosing Paper every time is the best response when they're only choosing Rock; by playing the Nash strategy, you denied yourself the opportunity to exploit their bad play. [Of course, if you do that, they can change to choosing Scissors every time, and suddenly you lose every round, so you must always be cautious in deviating from the Nash strategy.]

If you want to consistently win at a game with a completely mixed Nash equilibrium (if your opponents are good enough to exclude dominated strategies, a typical zero-sum games will have a completely mixed Nash equilibrium) then you need to trust that you can account for your opponent's plans better than they account for yours. The best tournament players of Rock-Paper-Scissors (yes, there are RPS tournaments) know this: of course you could go to a tournament and play the Nash strategy, but if you do that your chances of winning 7 matches in a row for a knockout tournament would be less than 1%, while the best players consistently reach later rounds at better than random rates.

Conclusion

The Nash strategy is, by definition and by various proofs from this sequence of posts, an optimal strategy to use in a symmetric zero-sum game. However, it is optimal only in the absence of information about your opponent's strategy, and taking into account any such information leads to better "best response" strategies – like Paper if they're choosing Rock too often. The best players might even deliberately deviate from the Nash strategy into what's known in the gaming industry as "donkey space" (e.g. see here) in order to tempt their opponent away from Nash, so that they can exploit their opponent's attempts to exploit them.

Sunday, 28 February 2021

Calculating Nash equilibria

Let's return to the example of Rock Paper Scissors. Common sense tells us that choosing each option a third of the time should be the "best" strategy. Can we pin this down mathematically by proving it is the Nash equilibrium strategy?

Nash strategies are "unexploitable"

A key to calculating Nash strategies is the observation that they are unexploitable. Let's define a "least exploitable" strategy as any strategy which on average ties with or beats every other possible strategy, and remember the definition of a Nash strategy as one of a pair of strategies for which neither player can benefit by changing strategy without their opponent also changing. [Remember as well that we talk about a Nash strategy – rather than a pair of Nash strategies – because if \((A,B)\) denotes a pair of strategies forming a Nash equilibrium, then \( (A,A) \) also forms a Nash equilibrium, at least in the context of a symmetric zero-sum game.] These two concepts of an optimal strategy coincide.

Nash strategies are least exploitable strategies, and vice-versa, for two-player symmetric zero-sum games

See if you can explain why this is true; I'll give a rigorous explanation in the next post. Note that this result lets us prove a claim from the second post: that choosing randomly between Nash strategies gives another Nash strategy. It's easy to show this for least exploitable strategies, and those coincide with Nash strategies by the above.

Verifying a strategy is Nash

By least exploitability, your average score when choosing Rock against me if I'm playing the Nash strategy is at most 0. This gives us the equation

\[\begin{split} &\quad\text{average score you get against my strategy when choosing Rock every time } \\ &\small{\left(= \left\{ \begin{split} &\text{your reward when I choose Rock}\times \text{proportion of the time I choose Rock} \\ +& \text{your reward when I choose Paper}\times \text{proportion of the time I choose Paper} \\ +&\text{your reward when I choose Scissors}\times \text{proportion of the time I choose Scissors}\end{split}\right\} \right)} \\ &\quad \text{is at most } 0.\end{split}\]

Remember the payoff matrix from last time, which gives the scores in each case:

	Rock	Paper	Scissors
Rock	0	-1	1
Paper	1	0	-1
Scissors	-1	1	0

Let's represent the proportions of each option which make up the Nash strategy using some letters so we can shorten the equation. Supposing the Nash strategy involves choosing Rock a proportion \(r\) of the time, Paper a proportion \(p\) of the time and Scissors a proportion \(s\) of the time, the above equation reduces to \begin{align*} 0r+1s+(-1)p&\leq 0, \text{ i.e.} \\ s-p &\leq 0. \end{align*} Similarly if I'm using the Nash strategy then I tie with or beat you if you choose Paper every time, or choose Scissors every time, which gives us the following equations: \[ r-s \leq 0, \quad p-r\leq 0. \] We can verify these hold with \(r=p=s=1/3\), confirming (as we already strongly suspected) that this is the Nash equilibrium! [As an aside, technically to show that a strategy is least exploitable we also need to show it doesn't lose to any mixture of Rock, Paper and Scissors. But this follows from it not losing to any of Rock, Paper or Scissors individually.]

Calculating a Nash strategy when you don't know it in advance

What if we hadn't already known what the Nash strategy was, so we couldn't just check that it satisfied the equations? Without knowing the answer in advance or guessing it, we want a method to find \(r,s,p\) such that \[ s-p\leq 0,\quad r-s\leq 0,\quad p-r\leq 0.\] A trick we can do in this case is add up the three left sides: we get \(s-p+r-s+p-r=0.\) If we take three numbers, none of which is bigger than zero, add them together and get zero, it follows that none of them can have been negative. We therefore in fact have \[s-p=0,~r-s=0,~p-r=0,\] so that \(r=p=s.\) Since \(r+p+s=1\) (these are the only three options, so the proportions of the time we choose them add up to \(1\)) we see that \(r=p=s=1/3.\)

This hints at a general trick. By definition a least exploitable, and therefore Nash, strategy satisfies that the average payoff of each fixed opposing strategy is at most zero. But in the case of Rock-Paper-Scissors, in fact the payoffs were equal to zero. Equations with equals signs are much easier to solve than ones with less than or equals signs, so if we can always replace \(\leq \) by \(=\), it will help a lot with calculations. Can we do that? For many games the answer is yes.

Suppose a Nash strategy for a game is a completely mixed strategy, i.e. every option is chosen at least some proportion of the time. Then the reward for playing any fixed strategy against this Nash strategy (which is at most zero because Nash strategies are least exploitable) is in fact exactly equal to zero.

Why? Again, I'll leave this for you to think about for now, and give a justification in the next post. In the mean time, let's find the Nash strategy for Rock-Paper-SuperScissors, as promised. The equations we get by using that the payoff of each pure strategy against the Nash strategy is zero are: \[\text{Rock: }-p+s=0,\quad \text{Paper: }r-2s=0,\quad \text{SuperScissors: }-r+2p=0 \] The first two equations yield \(p=s=r/2\). Then, because the proportions must add to 1, the Nash strategy is \((p,s,r)=(1/4,1/4,1/2)\), i.e. you should choose Paper and SuperScissors a quarter of the time each, and Rock half the time. That's right: you choose scissors less than in normal RPS, despite the double reward!

The Nash strategy for cops and robbers

I found this result surprising the first time I saw it. We've shown that increasing the reward for winning with scissors, instead of encouraging you to pick scissors more often, should actually encourage you to choose rock more often. Intuitively speaking, this is to prevent your opponent from benefitting too much from the super-scissors.

The same kind of calculations lead to some remarkable conclusions in modelling real world situations. Consider a simplified game theory model for crime, in which a thief wants to go out robbing if the police won't be patrolling that night, but doesn't want to go thieving if the police will be out and about; meanwhile, the police want to patrol if there'll be a thief out to catch but not otherwise. [Of course, like in Rock-Paper-Scissors, both the police and the thief must choose between their options without knowing what the other will do.] The same kind of calculations as for Rock-Paper-(Scissors/SuperScissors) above show that increasing the punishment for crime (which is like decreasing the reward for making the choice to commit a crime, if you get caught and hence "lose" the game) does not decrease rates of crime, but rather decreases the effort police go to to catch criminals: see the results here (in particular theorems 2 and 3). Similarly, increasing the rewards for catching a thief will decrease the crime rate, despite not increasing the proportion of the time that police spend patrolling.

This sounds insane! To get some intuition for why it can be true, think of changing rewards slowly and people reacting in stages. At first, increasing the reward for catching criminals encourages more patrolling, which makes it more likely that a robber gets caught. But if the police go out too often the robbers, who were willing to accept the old risk of being caught, will no longer go robbing because of the increased risk. At the old Nash equilibrium the risk and rewards were perfectly balanced, and since the jail setences and value of goods to steal haven't changed, the only way them to get balanced again is for the old rate of patrols to come back. So the higher rate of patrolling will die away, because otherwise there will never be robbers out for the police to catch, so they would never get to cash in on the bigger bonuses anyway. The extra pay isn't encouraging to the police to work harder, but it is at least encouraging them to work just as hard despite getting to catch crooks less often.

As a final point, let me emphasise that this is only a mathematical abstraction, and that what works in the model may not work as well in the real world. Regardless of whether it works in practice or not, pay based on numbers of arrests can lead to resentment among policed communities, who feel exploited and unfairly treated based on financial incentives, as the numerous articles complaining about "Collars for Dollars" show, even if these criticisms might not always be fair. The broader claim of the mathematical model – that increasing the probability of capture (for example by hiring more police) is much more effective at reducing crime than increasing the harshness of punishments – has been supported by evidence in practice, not just the theory.

Monday, 22 February 2021

Payoff matrices

How can we find the Nash equilibrium in a two-player, uncooperative, symmetric game? First it helps to have a more efficient way to describe games than the lengthy word explanations I gave in the first post. We can summarise a game through its payoff matrix: a table showing the different pairs of choices the players can make and the "payoffs" (scores) in each case.

Payoff matrices as an efficient way to describe a game

Let's explore payoff matrices, starting by considering Rock Paper Scissors. Here is a possible payoff matrix for the game.

	Rock	Paper	Scissors
Rock	0,0	0,1	1,0
Paper	1,0	0,0	0,1
Scissors	0,1	1,0	0,0

The row headings give the action I choose, the column headings give the action you choose, and the entries say how many points I get (before the comma) and how many points you get (after the comma). There's a lot of redundant information in there. Why?

Remember that the game is symmetric, so knowing what I score when we choose (Rock, Paper) is enough to know what you get when we choose (Paper, Rock). This means we only need half of the entries: the bottom left entries can be worked out by swapping the points round from the matching top right entry. For example, the (Scissors, Rock) entry, highlighted in red in the table, is 0,1, and we could work this out by reversing the (Rock, Scissors) entry, highlighted in blue, of 1,0. Alternatively, we could keep all the entries in the table, but only include my score, and work out what your score is from that.

Let's take the second of these options. Let's also make another slight change. I've been referring to Rock Paper Scissors as a "zero-sum game" because we can't both win or both lose at the same time: the actual number of points scored isn't so important as the difference between my number of wins and your number of wins. To really be a zero-sum game, we need to record this difference rather than the original scores, which yields the following table.

	Rock	Paper	Scissors
Rock	0	-1	1
Paper	1	0	-1
Scissors	-1	1	0

With this change we can summarise the payoffs even more efficiently. Suppose we only saw part of the table above:

	Rock	Paper	Scissors
Rock	?	-1	1
Paper	??	?	-1
Scissors	??	??	?

For each of the red question marks, the two players have chosen the same option, and necessarily draw because the game is symmetric, hence must both score zero points.

Let's consider the blue double question mark for the (Scissors, Rock) entry, in the bottom left. When I play Rock vs your Scissors, I score 1 point, so when you play Scissors vs my Rock you must likewise score 1 point. Because it's a zero-sum game, that means in the latter scenario I must score -1 point, and we can fill this in the bottom left corner.

We can work out the other double question mark entries similarly, so we've been able to work out the full table from only the entries in the top right corner. I'll continue to write the tables in full, because it will be convenient for calculations.

Some more payoff matrices

To demonstrate how efficient these tables are for describing games, here are the three other variants on rock paper scissors I needed whole paragraphs to explain in the first post.

	Rock	Paper	Scissors	Well
Rock	0	-1	1	-1
Paper	1	0	-1	1
Scissors	-1	1	0	-1
Well	1	-1	1	0

	Rock	Paper	Papyrus	Scissors
Rock	0	-1	-1	1
Paper	1	0	0	-1
Papyrus	1	0	0	-1
Scissors	-1	1	1	0

	Rock	Paper	SuperScissors
Rock	0	-1	1
Paper	1	0	-2
SuperScissors	-1	2	0

Payoff matrices allow you to very quickly spot certain dominated strategies and duplicate strategies. In particular, we can see that in the first example that the row entries for Rock are all smaller than or the same as those for Well, which entails that Rock is dominated by Well. Meanwhile, in the second table, the entries in the Paper row and exactly the same as those for the Papyrus row (and similarly for their columns).

Exercise

See if you can find the omitted entries of this payoff matrix. Do you think there's a dominated strategy?

	Option 1	Option 2	Option 3	Option 4
Option 1	?	3	-2	1
Option 2	?	?	0	6
Option 3	?	?	?	-4
Option 4	?	?	?	?

Arbitrage betting

We can't always spot duplicate or dominated strategies by directly comparing table rows: as in the last post, sometimes a mixture of strategies can mimic or outperform a strategy (or mimic a different mixture of strategies). The fact that there are multiple ways to make the same bet is something familiar to lots of gamblers: for example, you could:

Bet directly that Chelsea win
Make two bets simultaneously:
- That both Chelsea and Aston Villa win:
- That Chelsea win but Aston Villa lose or draw.

It may be that one of these betting methods gives you better odds for Chelsea winning, making than the other a dominated strategy. Sometimes the division can be more complicated than this: I remember exploring Paris with a friend who had to pause to place a bet that Southampton's would win the second half of their match, having previously made bets regarding the overall result and the first half. Finding ways to divide up bets like this can be used to great profit: sometimes you can find a sure win, throguh what is known as arbitrage betting.

Sunday, 20 December 2020

Some solutions for the last post

Exercises

Last time left some questions in the text or comments:

Why can we assume that both players are playing the same strategy for in a Nash equilibrium for a symmetric zero-sum game?
Why is Rock not used in Rock-Paper-Scissors-Well?
Why was there non-uniqueness for Rock-Paper-Papyrus-Scissors?
What's the equilibrium in Rock-Paper-SuperScissors?
Why is the scenario described in this scene from "A Beautiful Mind" not a Nash equilibrium?

Thanks to Sam Olesker-Taylor for working through questions like these with me.

Solutions

1.

Let \(A,B,C,D\) denote strategies – so \(A\) might stand for "choose Rock every time", or it might stand for "choose Rock half the time and Scissors half the time", to give two examples. When I use strategy \(A\) and you use strategy \(B,\) write \(AB\) for the score I get (averaging over any random choices we make). Suppose the pair of strategies \( (A,B)\) forms a Nash equilibrium. Then by definition, for any strategies \(C,D,\)

\begin{align} CB&\leq AB \tag{1} \\
AD&\geq AB. \tag{2}
\end{align}
[The first inequality records the fact that I can't improve my score by unilaterally changing strategy. The second says that you can't improve your score by unilaterally changing strategy, and we've used that the game is symmetric and zero sum to see that you improving your score is equivalent to you causing my score to worsen.] Symmetry and the zero-sum nature of the game also allows us to see that \(AA=BB=0\), and that \(CA=-AC\).

Our goal is to show that \((A,A)\) forms a Nash equilibrium, which is equivalent to saying that for all \(C,D\)
\begin{align} CA&\leq AA, \tag{3} \\
AD&\geq AA. \tag{4}
\end{align}
Because \(CA\leq AA\) is equivalent to \(-AC\leq -AA\) which is equivalent to \(AC \geq AA\), it's enough to prove that (4) holds: (3) will then follow automatically.
Using the definitions (1) and (2) we see, for any \(D\) and \(C\),
\[ AD\geq AB\geq CB. \] In particular, this holds for \(C=B\), so that \(AD\geq BB=0 =AA,\) as required.

2.

Rock isn't used in RPSW because it is dominated: there's a state which is always a better option. What that means is that whatever option you choose, if I choose Well I do at least as well as if I choose Rock, and there's at least one option of yours (in this case two options: Rock and Well) against which I do better by choosing Well than by choosing Rock.

This is an incomplete definition of a dominated strategy, which we will return to after considering the next question.

3.

Nonuniqueness in RPPS comes from the fact that in any strategy we can swap Paper for Papyrus, and vice versa, without changing anything. This leads to a first guess (and a mathematician's word for a guess, when they want to sound fancy, is a conjecture):

Conjecture: a Nash strategy fails to be unique when there are two choices which are functionally identical

Further exploration shows that this isn't quite right though: sometimes we can have a unique Nash equilibrium even when two states are copies of each other! For example, if we added Stone as a copy of Rock in the game Rock-Paper-Scissors-Well, then there is still a unique Nash equilibrium of choosing each of Paper, Scissors and Well a third of the time. The reason uniqueness holds here is because Rock isn't chosen in the Nash strategy anyway, so duplicating it doesn't add any extra optimal strategies. That leads us to a second guess:

Conjecture: a Nash strategy fails to be unique when there are two choices which are functionally identical and not dominated

This is better, but I asked you think as broadly as possible, and we can show that this doesn't capture all possibilities for how uniqueness can fail with the following examples.

Example: Rock Paper Scissors Pissors

In this game,

Rock beats Paper (scoring 1 point for the winner)
Paper beats Scissors (1)
Scissors beats Rock (1)
Rock ties with Pissors (each player scores 0 points)
Pissors beats Paper (but the winner scores only 1/2 a point)
Scissors beats Pissors (again scoring 1/2 a point)

Here, the scores obtained by Pissors are the same as someone would get on average by flipping a coin to choose between Paper and Scissors. Why is this relevant to uniqueness? Well, we said that choosing each of Rock, Paper and Scissors a third of the time was a Nash strategy, but now choosing Rock a third of the time and Pissors two thirds of the time will also be a Nash strategy, because it's effectively the same mixed strategy. The problem here was that Pissors can be expressed by choosing randomly between (or "mixing") the states of Scissors and Paper. Let's try again.

Conjecture: a Nash strategy fails to be unique when some option, which isn't dominated, can be expressed as a mixture of other choices

This still isn't quite complete, as an example closely related to the last one will show:

Example: Rock, Paper, Pissors, Rissors

Here, you can no longer directly choose Scissors. Instead, you can choose Rissors, which scores the same as the average of flipping a coin to choose between Rock and Scissors.
Now there is no one choice which can be replicated by combining other options (I'll leave it to you to consider why that is). However, uniqueness still fails: choosing Rock a third of the time and Pissors two thirds gives the same outcomes on average as choosing Paper a third of the time and Rissors two thirds (and both are Nash strategies). This leads to a final conjecture.

Conjecture: a Nash strategy fails to be unique if and only if there is redundancy: some non-dominated mixture of strategies can be obtained via two different combinations of choices

Here's the kicker: I haven't been able to prove if this conjecture is true!
This paper, in equation (8), gives a condition for the existence of a unique "completely mixed" Nash strategy, i.e. one in which every option is chosen at least some proportion of the time – unlike Rock in Rock Paper Scissors Well, which was never chosen in the Nash strategy. [The paper uses tables – or matrices – to describe games; my next post will discuss how to represent games with matrices.]

It's not hard to show that if a strategy is the only completely mixed Nash strategy, it's also the unique Nash strategy in total. [Why? Suppose \(A\) is the unique completely mixed Nash strategy, and \(B\) is another Nash strategy. We'll see in a subsequent post that if there are two Nash strategies then choosing randomly between them gives another Nash strategy, so that the strategy \(C\), which sees you flip an initial coin to choose whether to play according to strategy \(A\) or strategy \(B\), is a Nash strategy. It's also completely mixed: it uses every option at least half as often as \(A\) does. That contradicts the fact that \(A\) is unique among completely mixed strategies.].

Working out whether the conditions on the payoff matrix in the paper are the same as redundancy as defined here is something I haven't managed, and reader input would be welcomed!

Solution 2, continued

In view of the mixing discussions above, a dominated option should actually be defined as any state for which some mixture of the other states always does at least as well on average against every possible option, and does better on average against some option. For some purposes it will be convenient to demand that the option against which the mixture does strictly better is not itself dominated. Otherwise, in for example a game of Rock-Paper-Papyrus-Scissors, we could just add a useless state which loses to or draws with everything but loses worse to Paper than Papyrus; this would entail that Papyrus was a dominated strategy, and yet it defies some expectations of a dominated strategy, since for example 1/3rd each of Rock/Papyrus/Scissors would still be a Nash strategy.

4.

The solution, we will see in an upcoming post, is to play Rock half the time and SuperScissors and Paper a quarter of the time each. Is that what you expected? Soon I'll address how to calculate Nash equilibria and how to interpret solutions like this one!

5.

Nash says none of his group should attempt to seduce the lady they are all most interested in, because if they all target her then they will "block each other"; instead they should all "settle" for one of her friends.
In fact, in such a scenario, any one of them can improve their personal outcome by unilaterally changing to targeting their first choice instead. The Nash equilibrium is that exactly one of Nash's group goes for her: only then will the others have no incentive to change strategy because of the blocking issue.

Sunday, 13 December 2020

Zero sum games: where Trump and Biden differ on trade

Nash's theorem shows the existence of a certain "optimal" strategy for playing games, called the Nash strategy. This is the first of a series of posts exploring the introductory concepts of game theory and Nash's theorem, with the ultimate goal of explaining in exactly what sense a Nash strategy is optimal, and why, despite this "optimality", it's rarely a good idea to play the Nash strategy if you want to win. This post will define Nash's theorem for certain types of game, called zero-sum games, and examine (without going into depth) how game theory can illuminate a key policy difference between the candidates in the last US presidential election. There are puzzles dotted through the text, and gathered together in the comments section, for those of you who are interested in doing some exploring on your own; these puzzles will be discussed in future posts.

Nash's theorem

You may have heard of the mathematician and economist John Nash, perhaps through the film "A beautiful mind" which portrays parts of his life and career. One version of his theorem is the following, though he actually proved something more general, allowing for more than two players, games which are not zero-sum or symmetric, and guaranteeing the existence of a collection of strategies – one for each player – collectively called a Nash equilibrium.

Nash's theorem: For any two-player symmetric zero-sum game, there exists a Nash strategy.

What do each of those terms mean? I won't give formal definitions because pinning everything down rigorously can be tedious. Instead, I'll give an idea of what you should be picturing for each word, and then lots of examples.

Surer than almost sure: Axiomatic probability

Axiomatic probability

Perhaps you noticed in the last post that I couldn’t help myself from slipping in just a little probability, when imagining throwing a dart at the number line. That shows how probability theory can be helpful in thinking about measures. What about the other way round: why is it useful to view probability theory as a sub-branch of measure theory?
Suppose we have a uniform random variable: a random number, between zero and 1, equally likely to be anywhere on the number line. How likely is it to be

between 0 and 1/2?
exactly equal to 2/3?
between 1/6 and 1/3, if I’ve told you it’s between 0 and 1/2?

Surer than almost sure: Measure theory

Measure theory

In this post and the following one, I will explain a bugbear of mine: the fact that something is described as ‘almost surely’ true, when it should really just be described as ‘surely’ true. The area of maths this falls in is ‘axiomatic’ probability theory, which is probability theory viewed as a sub-branch of something called measure theory, and so the thrust of this first post will be to briefly introduce measure theory.

Sets

Measure theory is, you guessed it, all about measuring things. The things you can measure are ‘sets’: a set is just the fancy way of saying any collection of things. Here are some sets.

\(\{1, 2, 3\}\). The curly brackets or braces are the way mathematicians denote sets, so this set has the numbers 1, 2 and 3 in it. The fancy name for the things in a set are the ‘elements’ of the set, so the elements of \(\{1, 2, 3\}\) are 1, 2 and 3.
\(\mathbb{R}\). This denotes the set of all ‘real’ numbers (hence the R): in other words, the number line.
\(\{x \in \mathbb{R} : 0 \leq x\leq 1\}\). I’ve introduced another way mathematicians write down sets: this means the collection of all real numbers (hence the \(x \in \mathbb{R} \) bit) which satisfy the condition \(0 \leq x \leq 1 \). So this is just the portion of the number line between zero and one (including for example 1/2, 0.23, \(1/\pi \),…).
\(\{\text{red, orange, yellow, green, blue, indigo, violet}\}\). Sets don’t have to consist of obviously mathematical things like numbers, but can include whatever you like, like here the colours of the rainbow. You could even mix and match, so \(\{1, \text{red}\}\) is a set.
The set of all points on the ground which lie within Paris. I choose Paris partly because it’s well defined as a city: inside the ‘périphérique’ you’re in Paris, and outside you’re not.