Sunday, 21 March 2021

Don't play Nash if you want to win

Nash strategies are the best possible strategies in some ways...

Last time it was left as an exercise to prove the following.

Nash strategies are least exploitable strategies, and vice-versa, for two-player symmetric zero-sum games

This gives a sense in which Nash strategies are optimal. Remember a least exploitable strategy is defined as any strategy which on average ties with or beats every other possible strategy. What that means is that even if I tell you that I'm going to use the Nash strategy, you still can't beat me. That obviously isn't the case for most strategies: if I told you I was going to use Rock every time, you could choose Paper every time to win.

To justify the claim, suppose \(A\) denotes a Nash strategy and \(B\) denotes some other strategy. The average score of the second player when playing \(B\) is not an improvement on their average score when playing \(A\), since \((A,A)\) is a Nash equilibrium so they can't benefit from unilaterally switching. Because the game is symmetric and zero-sum, the average score for each player when they both play \(A\) is zero. This shows that a Nash strategy is least exploitable.

We still need to prove what mathematicians call the converse: that any least exploitable strategy is Nash. If \(A\) is a least exploitable strategy, the average score for each player when both play \(A\) is zero. The average score one player gets when unilaterally switching is at most zero because \(A\) is least exploitable, and this proves that the pair of strategies \((A,A)\) forms a Nash equilibrium.  


We can prove another sense in which Nash strategies are optimal. Remember that a "dominated" strategy was one which is outperformed by one of the options (or a mixture of the options), like Rock in the game Rock-Paper-Scissors-Well (because Well matches the result of Rock against Paper and Scissors, but whereas Rock ties with itself, Well beats Rock).

If you play a Nash strategy and I play a strategy in which I sometimes choose a dominiated option, you win on average.
 I'll prove it in the example case Rock-Paper-Scissors-Well to have names for everything, but the proof works in general. The dominated strategy is Rock, and we're assuming I choose it at least some of the time. Suppose you, following the Nash strategy, also play against my sister, and she makes the exact same moves as me except that every time I choose Rock she chooses Well. Then at best she draws with you (because you're playing a Nash strategy) and she outperforms me (because every time I choose Rock and you choose Well, I lose but she draws; note this happens at least sometimes). So I must lose to you on average.

[Actually, strictly speaking the proof I've given isn't quite valid, and indeed the "theorem" isn't true. Perhaps you remember, all the way back in post 2, where we considered a game of Rock Paper Papyrus Scissors and then added something which, say, loses to Paper and ties with everything else – let's call this option Fail. Choosing Rock, Papyrus and Scissors a third of the time each remains a Nash strategy in this game and Fail is a dominated strategy (see if you can prove these claims). But none of Rock, Papyrus or Scissors beats Fail, hence nor does the Nash strategy obtained by mixing them. Papyrus itself is technically a dominated strategy in this game – it's outperformed by Paper – despite appearing in a Nash equilibrium, which is the cause of this trouble. To avoid this one should restrict attention to admissible Nash strategies, which are ones excluding dominated strategies.] 

In some games, such as La Relance (see section 7.1 here), a two-player poker game which can be viewed as zero-sum (but not symmetric), the non-dominated strategies maybe be complex enough (bet when your hand is higher than some precise threshold and otherwise don't) that a typical opponent can be expected to err occasionally, yielding a guaranteed (average) profit by following an (admissible) Nash strategy.


But using Nash strategies isn't always the best idea

Using a mixture Rock, Papyrus and Scissors is silly in the game of Rock-Paper-Papyrus-Scissors-Fail, despite being Nash, because using Paper instead of Papyrus gives better outcomes. Remember also the example of choosing a lane to overtake in, from all the way back in the first post? To recap, if both the slow driver in front and the fast driver behind choose the inside lane, they crash; if they both choose the outside lane they crash; but if one chooses the inside and one the outside then the overtake happens safely for both. One of the three Nash equilibria for this "game" was each driver choosing lane by flipping a coin: if I know you're choosing by flipping a coin, I can't have a better than 50-50 chance of avoiding a crash no matter how I make my choice. This last Nash strategy is clearly not a good idea, because we can avoid crashes much more than the 50% of the time it yields: indeed, we can avoid crashing all the time, just by choosing different lanes.

These are surface-level problems which can easily be avoided: in each case, while there is a "bad" Nash equilbrium, there are also better ones, so we don't have to give up on choosing a Nash strategy to overcome this "badness". However, the problems with using Nash strategies go deeper. Remember that last time we had the following claim:

Suppose a Nash strategy for a game is a completely mixed strategy, i.e. every option is chosen at least some proportion of the time. Then the reward for playing any fixed option against this Nash strategy, which by definition is at most zero, is in fact exactly equal to zero.

To prove this, suppose there's some option which loses to the Nash strategy, rather than tying. If we're both playing the completely mixed Nash strategy and you switch to using a strategy which never chooses this losing option but picks all the other options with the same relative proportions then you'll beat me: when I don't choose the losing strategy, we're making the same average choices and hence we tie, and when I do choose it I lose on average. The fact you can improve your payoff by unilaterally changing strategy means my strategy couldn't have been Nash in the first place. 

We've previously seen (thanks to equation (8) here) that for many symmetric zero-sum games, the Nash strategy is completely mixed. The result we've just proved then means that by playing Nash, you'll never win against your opponent, no matter how bad they are. Imagine playing Rock-Paper-Scissors against someone who chooses Rock every time: by rigidly sticking to the Nash strategy, you let them win and draw \(1/3\) of the time each, where you could win every time by choosing Paper.

 When you can work out your opponent's strategy, your best response should take that into account, in contrast to a least exploitable Nash strategy which has to do ok regardless of what they're doing. By making yourself more vulnerable to some strategies (which they aren't using), you can potentially perform better against the strategy they are using. In the example choosing Paper every time is the best response when they're only choosing Rock; by playing the Nash strategy, you denied yourself the opportunity to exploit their bad play. [Of course, if you do that, they can change to choosing Scissors every time, and suddenly you lose every round, so you must always be cautious in deviating from the Nash strategy.]

If you want to consistently win at a game with a completely mixed Nash equilibrium (if your opponents are good enough to exclude dominated strategies, a typical zero-sum games will have a completely mixed Nash equilibrium) then you need to trust that you can account for your opponent's plans better than they account for yours. The best tournament players of Rock-Paper-Scissors (yes, there are RPS tournaments) know this: of course you could go to a tournament and play the Nash strategy, but if you do that your chances of winning 7 matches in a row for a knockout tournament would be less than 1%, while the best players consistently reach later rounds at better than random rates.

Conclusion

The Nash strategy is, by definition and by various proofs from this sequence of posts, an optimal strategy to use in a symmetric zero-sum game. However, it is optimal only in the absence of information about your opponent's strategy, and taking into account any such information leads to better "best response" strategies – like Paper if they're choosing Rock too often. The best players might even deliberately deviate from the Nash strategy into what's known in the gaming industry as "donkey space" (e.g. see here) in order to tempt their opponent away from Nash, so that they can exploit their opponent's attempts to exploit them.