Axiomatic probability
Perhaps you noticed in the last post that I couldn’t help myself from slipping in just a little probability, when imagining throwing a dart at the number line. That shows how probability theory can be helpful in thinking about measures. What about the other way round: why is it useful to view probability theory as a sub-branch of measure theory?Suppose we have a uniform random variable: a random number, between zero and 1, equally likely to be anywhere on the number line. How likely is it to be
- between 0 and 1/2?
- exactly equal to 2/3?
- between 1/6 and 1/3, if I’ve told you it’s between 0 and 1/2?
- How long is the region from 0 to 1/2? (1/2)
- How long is the single point 2/3? (0)
- What proportion of the length from 0 to 1/2 falls between 1/6 and 1/3? (1/3)
Probabilities are measures
Consider the set of all possible outcomes of some random process. For example, if we’re flipping two coins successively, this is (denoting heads by \(H\) and tails by \(T\)) \[\{HH,HT,TH,TT\}.\] We can define a measure by defining the ‘size’ of a collection of possible outcomes to be the probability at least one of those outcomes occurs. Here’s a table in the coinflip case. I’ve not given every possible collection of outcomes, just some illustrative ones.
| Set | Measure |
| \(\{HH\}\) | 1/4 |
| \(\{TH\}\) | 1/4 |
| \(\{HH,HT\}\) | 1/2 |
| \(\{HH,TH,TT\}\) | 3/4 |
| \(\{HH,HT,TH,TT\}\) | 1 |
We can do this in our previous example with a uniform random variable. If we define the measure of a set by the probability our variable lies in that set, the measure we’d get would just be length. (This would be a circular way of defining length though, since we used length to describe what it meant to be uniform.)
Any measure we get like this from a random process will share a trait: the total measure, of all possible outcomes combined, will equal 1 (or 100% if you prefer). This leads us to the axiomatic definition of probability: \[\text{‘probabilities' are measures giving size 1 to the totality of the space of interest.}\] So measure theory can pin down probability questions, including weird ones like the probability of a uniform random variable being rational number we talked about before.
Almost surely
If you flip a fair coin forever, what’s the chance you keep getting heads?
The chance of 2 heads in a row is \(1/4\), the chance of 3 heads in a row is 1/8, the chance of \(n\) heads in a row is \((1/2)^n\), and the chance we get heads forever is 0. There’s no probability it can happen. Mathematicians tend to avoid saying ‘it’s impossible’ in this case, because in some sense it is possible: one of the options for what happens when we keep flipping coins is that we get heads every time. (Similarly, even though almost every number between 0 and 1 is irrational, choosing one at random we could in priniciple still hit one of the rationals.) But the probability this happens is zero. Everyone has an intuitive sense of this (even if the idea of something ‘possible’ having zero probability is very strange): if you took a 50/50 bet again and again and kept losing, literally forever, you’d know you were being cheated.
Since mathematicians don’t like to use the word ‘impossible’ they need a term for this: something that is in principle an option, but has no probability of actually showing up. It’s good enough to give a name to the opposite: something that in principle could fail to happen, but which has probability 1 of occurring. Such an event is called ‘almost sure’, or we say that ‘almost surely’ the thing will happen.
The crux
We’ve finally reached the whole reason I wrote a much-too-long-given-the-payoff pair of posts. Why is an event of probability 1 called almost sure, and why don’t I like it?
Remember in the second post I defined ‘almost everywhere’ (which is a proper mathematical term) and then immediately started talking about ‘almost everyone’ (which isn’t)? My hunch is that mathematicians building probability theory from measures did roughly the same thing: they decided to stick with the ‘almost something’ theme. For the ‘something’, they decided that ‘surely’ is a good probability-sounding word which roughly conveys the right intuitive meaning of having probability 1.
I like the name ‘almost everywhere’ in measure theory, because it really sounds like what it means – not every real number is irrational, just ‘almost’ every number. But in probability theory, I don’t think ‘almost surely’ conveys what it should. It’s not ‘almost’ sure that you’ll flip at least 1 tails if you keep going forever, it’s sure. ‘Sure’, to me, doesn’t convey that there’s no possibility for it to fail. Instead, it conveys that there is no probability of it failing, without needing to be qualified with an ‘almost’.
How do you feel about the term "almost certainly"? :)
ReplyDeleteGreat alternative suggestion: sticks with the "almost" theme, but also sounds more probable than "almost surely". Kudos!
Delete