Kweku's blog: Surer than almost sure: Axiomatic probability

Axiomatic probability

Perhaps you noticed in the last post that I couldn’t help myself from slipping in just a little probability, when imagining throwing a dart at the number line. That shows how probability theory can be helpful in thinking about measures. What about the other way round: why is it useful to view probability theory as a sub-branch of measure theory?
Suppose we have a uniform random variable: a random number, between zero and 1, equally likely to be anywhere on the number line. How likely is it to be

between 0 and 1/2?
exactly equal to 2/3?
between 1/6 and 1/3, if I’ve told you it’s between 0 and 1/2?

I’ve said the number is equally likely to fall anywhere between 0 and 1 on the number line, so each of these questions is really a question of lengths, and hence can be answered using measure theory:

How long is the region from 0 to 1/2? (1/2)
How long is the single point 2/3? (0)
What proportion of the length from 0 to 1/2 falls between 1/6 and 1/3? (1/3)

It maybe seems like overkill to introduce all the definitions in the first post just to answer probability questions like this. Length is an obvious enough concept, do we really need measure theory to use it? When we start asking more complicated questions than the above we really do. Take this question for example: given a random number between 0 and 1, how likely is it to be rational? There are infinitely many rational numbers and infinitely many irrational numbers, so it’s not obvious intuitively. Measure theory gives an answer: it turns out that the ‘total length’ of the collection of rational numbers is 0, while the ‘total length’ of the collection of irrational numbers is 1. Remembering the phrases we defined last time we can say that the rational numbers are a null set with respect to length, so that almost every number between zero and one is irrational.

Probabilities are measures

Consider the set of all possible outcomes of some random process. For example, if we’re flipping two coins successively, this is (denoting heads by \(H\) and tails by \(T\)) \[\{HH,HT,TH,TT\}.\] We can define a measure by defining the ‘size’ of a collection of possible outcomes to be the probability at least one of those outcomes occurs. Here’s a table in the coinflip case. I’ve not given every possible collection of outcomes, just some illustrative ones.

Set	Measure
\(\{HH\}\)	1/4
\(\{TH\}\)	1/4
\(\{HH,HT\}\)	1/2
\(\{HH,TH,TT\}\)	3/4
\(\{HH,HT,TH,TT\}\)	1

We can do this in our previous example with a uniform random variable. If we define the measure of a set by the probability our variable lies in that set, the measure we’d get would just be length. (This would be a circular way of defining length though, since we used length to describe what it meant to be uniform.)
Any measure we get like this from a random process will share a trait: the total measure, of all possible outcomes combined, will equal 1 (or 100% if you prefer). This leads us to the axiomatic definition of probability: \[\text{‘probabilities' are measures giving size 1 to the totality of the space of interest.}\] So measure theory can pin down probability questions, including weird ones like the probability of a uniform random variable being rational number we talked about before.

Almost surely

If you flip a fair coin forever, what’s the chance you keep getting heads?
The chance of 2 heads in a row is \(1/4\), the chance of 3 heads in a row is 1/8, the chance of \(n\) heads in a row is \((1/2)^n\), and the chance we get heads forever is 0. There’s no probability it can happen. Mathematicians tend to avoid saying ‘it’s impossible’ in this case, because in some sense it is possible: one of the options for what happens when we keep flipping coins is that we get heads every time. (Similarly, even though almost every number between 0 and 1 is irrational, choosing one at random we could in priniciple still hit one of the rationals.) But the probability this happens is zero. Everyone has an intuitive sense of this (even if the idea of something ‘possible’ having zero probability is very strange): if you took a 50/50 bet again and again and kept losing, literally forever, you’d know you were being cheated.
Since mathematicians don’t like to use the word ‘impossible’ they need a term for this: something that is in principle an option, but has no probability of actually showing up. It’s good enough to give a name to the opposite: something that in principle could fail to happen, but which has probability 1 of occurring. Such an event is called ‘almost sure’, or we say that ‘almost surely’ the thing will happen.

The crux

We’ve finally reached the whole reason I wrote a much-too-long-given-the-payoff pair of posts. Why is an event of probability 1 called almost sure, and why don’t I like it?
Remember in the second post I defined ‘almost everywhere’ (which is a proper mathematical term) and then immediately started talking about ‘almost everyone’ (which isn’t)? My hunch is that mathematicians building probability theory from measures did roughly the same thing: they decided to stick with the ‘almost something’ theme. For the ‘something’, they decided that ‘surely’ is a good probability-sounding word which roughly conveys the right intuitive meaning of having probability 1.
I like the name ‘almost everywhere’ in measure theory, because it really sounds like what it means – not every real number is irrational, just ‘almost’ every number. But in probability theory, I don’t think ‘almost surely’ conveys what it should. It’s not ‘almost’ sure that you’ll flip at least 1 tails if you keep going forever, it’s sure. ‘Sure’, to me, doesn’t convey that there’s no possibility for it to fail. Instead, it conveys that there is no probability of it failing, without needing to be qualified with an ‘almost’.

Kweku's blog

Saturday, 7 March 2020

Surer than almost sure: Axiomatic probability

Axiomatic probability

Probabilities are measures

Almost surely

The crux

2 comments: