I’ve been thinking about how to teach measure theory starting from building a probability measure on the sequence of infinite tosses of a fair coin instead of the standard treatment which starts with Lebesgue measure. This post is an outline of how to translate Vitali’s proof of the existence of nonmeasurable sets for Lebesgue measures to this setup.
The Problem
Our sample space Ω consists of infinite strings from the alphabet {H, T}. We want to prove that there is no probability measure μ defined over all subsets of Ω such that the different tosses are independent events with the probability of heads and tails on each toss equal to 1/2.
The strategy
Our strategy is to closely follow Vitali’s classic proof by contradiction in the case of Lebesgue measure.
Suppose we have a probability measure defined over all subsets of Ω.
We find a countable series of transformations τi which are measure preserving, i.e. μ(τi(A)) = μ(A) for all i and all A ⊂ Ω.
We then construct a set V with the property that the sets τi(V) form a partition of Ω.
Now we have a contradiction. Since each τi(V) has the same measure as V, if V has a positive measure, Ω would end up having infinite measure. If V has measure zero, Ω would end up having measure zero. Either possibility contradicts the claim that μ is a probability measure.
While Vitali’s original proof used the translation independence of the Lebesgue measure, our proof uses the symmetry arising from the assumption that the coin tosses are fair and independent.
The transformations
Consider a finite set of indices D. Then for any infinite sequence of outcomes ω we define τD(ω) to be the sequence of outcomes obtained by taking ω and flipping (i.e. changing heads to tails and vice versa) the outcome at each of the indices contained in D.
So, for example, τ{5, 6}(ω) would flip the outcomes at the 5-th and 6-th positions in ω and leave the rest of the sequence unchanged.
We first note two elementary properties of these transformations. First, because flipping the same indices twice restores them to their original state, each τD is its own inverse. Second, for two sets D1 and D2 the composition τD1 ∘ τD2 flips twice (and hence restores) the indices belonging to both D1 and D2 and flips once the indices belonging to exactly one of them. Thus it equals τD1ΔD2 (D1ΔD2 being the symmetric difference of D1 and D2) and hence belongs to the class of transformations we are considering.
Next, we will show that it follows from independence and fairness that the transformations τD are measure preserving. The logic is simple. Independence lets us delink what happens at a given finite set of indices from what happens at the rest of the indices. Fairness implies that any sequence of outcomes of a given finite length l has the probability 2−l. So a sequence like HTH and its flipped version THT both have the probability 1/8. So a flip over a finite index set should not change the probability of any event. Now we put in the grunt work to show this formally.
Let ϕ the operation of flipping every outcome in a sequence, ωD to be the outcomes in ω at indices in D in ω and ω−D to be the the outcomes in ω at indices in the complement of D. Let S(l) the set of all finite sequences of {H, T} of length equal to l.
Then for any event A and any finite set of indices D of size l, we have by countable additivity, μ(A) = ∑s ∈ S(l)μ(A∩{ωD=s}) [Here’s why we have to make D finite. If D were infinite the set of sequences of length equal to the length of D would be uncountable and we would not be able to use countable additivity.]
Using independence, we have = ∑s ∈ S(l)μ(ωD=s) ⋅ μ(ω−D∈(A∩ωD=s)−D)
Using fairness to calculate μ(ωD=s) = ∑s ∈ S(l)2−lμ(ω−D∈(A∩{ωD=s})−D)
Using the definition of ϕ (flip all outcomes) and τD (flip outcomes in D) we have = ∑s ∈ S(l)2−lμ(ω−D ∈ (τD(A)∩(ωD=ϕ(s))−D)
Now S(l) contains all finite sequence of length l. So as s ranges over S(l), ϕ(s) also ranges over S(l). Therefore, = ∑s ∈ S(l)2−lμ(ω−D ∈ (τD(A)∩(ωd=s)−D)
Once again using fairness to calculate μ(ωD=s) = ∑s ∈ S(l)μ(ωD=s) ⋅ μ(ω−D ∈ (τD(A)∩(ωd=s)−D)
But by independence, = μ(τD(A)) and our proof that τD is measure-preserving is done.
The partition
Now we construct a set V whose images under the different τD will be a partition of Ω. To do so we first define an equivalence relation on elements of Ω. We say that ω1 ∼ ω2 if there is a τD with ω1 = τDω2. We check that this is really an equivalence relation.
- ω ∼ ω [Since τ∅ is the identity.]
- ω1 ∼ ω2 ⇒ ω2 ∼ ω1. [If ω1 = τDω2 then because τD is its own inverse it follows that τDω1 = ω2.]
- ω1 ∼ ω2 and ω2 ∼ ω3 implies ω1 ∼ ω3. [If ω1 = τD1ω2 and ω3 = τD2ω2 then ω3 = τD2 ∘ τD1(ω1) = τD1ΔD2(ω1).]
Let V be a set formed by taking one element each from each of the equivalence classes formed by the equivalence relation ∼. Since this selection of an element from each class is arbitrary we have to rely on the Axiom of Choice to ensure that the set V exists.
Now we claim that τD(V) as D ranges over all finite sets of indices is a partition of Ω.
First we see that every ω ∈ Ω belongs to some τD(V). This is because ω belongs to one of the equivalence classes of ∼. If ω′ is the representative of this class that has been included in V then by the definition of ∼ there is some D such that ω = τD(ω′) and hence ω ∈ τD(V).
Next we see that if D1 ≠ D2 then τD1(V) ∩ τD2(V) = ∅. Suppose on the contrary that τD1ω1 = τD2ω2 for some ω1, ω2 ∈ V. Using the properties of τ we have τD1ΔD2(ω1) = ω2. By the definition of ∼ this means ω1 ∼ ω2 But since V contains exactly one element from each equivalence class this means that ω1 = ω2 = ω (say), so that τD1ΔD2(ω) = ω But since no sequence can remain unchanged if some outcomes are flipped it must be the case that D1ΔD2 = ∅ which implies D1 = D2 and contradicts our assumption that D1 ≠ D2.
Thus we have established that the τD(V) form a partition of Ω.
The contradiction
We have from the previous section Ω = ⋃DτD(V) where the union is disjoint and D ranges over all finite sets of indices. The collection of finite sets of indices is countable (check). Hence we are justified in using countable additivity to get μ(Ω) = ∑Dμ(τD(V))
which because the τD’s are measure-preserving gives us μ(Ω) = ∑Dμ(V)
Now there are two possibilities. μ(V) = 0 in which case we get μ(Ω) = 0, or that μ(V) > 0 in which case we get μ(Ω) = ∞. In either case we have a contradition with the requirement from a probability measure that μ(Ω) = 1.
References
Other proofs of this result can be found in Blackwell and Diaconis (1996) and in Chapter 5 of Oxtoby’s Measure and Category.