Nonmeasurable sets of infinite independent fair coin tosses

I’ve been thinking about how to teach measure theory starting from building a probability measure on the sequence of infinite tosses of a fair coin instead of the standard treatment which starts with Lebesgue measure. This post is an outline of how to translate Vitali’s proof of the existence of nonmeasurable sets for Lebesgue measures to this setup.

The Problem

Our sample space Ω consists of infinite strings from the alphabet {H, T}. We want to prove that there is no probability measure μ defined over all subsets of Ω such that the different tosses are independent events with the probability of heads and tails on each toss equal to 1/2.

The strategy

Our strategy is to closely follow Vitali’s classic proof by contradiction in the case of Lebesgue measure.

Suppose we have a probability measure defined over all subsets of Ω.

We find a countable series of transformations τ_i which are measure preserving, i.e. μ(τ_i(A)) = μ(A) for all i and all A ⊂ Ω.

We then construct a set V with the property that the sets τ_i(V) form a partition of Ω.

Now we have a contradiction. Since each τ_i(V) has the same measure as V, if V has a positive measure, Ω would end up having infinite measure. If V has measure zero, Ω would end up having measure zero. Either possibility contradicts the claim that μ is a probability measure.

While Vitali’s original proof used the translation independence of the Lebesgue measure, our proof uses the symmetry arising from the assumption that the coin tosses are fair and independent.

The transformations

Consider a finite set of indices D. Then for any infinite sequence of outcomes ω we define τ_D(ω) to be the sequence of outcomes obtained by taking ω and flipping (i.e. changing heads to tails and vice versa) the outcome at each of the indices contained in D.

So, for example, τ_{5, 6}(ω) would flip the outcomes at the 5-th and 6-th positions in ω and leave the rest of the sequence unchanged.

We first note two elementary properties of these transformations. First, because flipping the same indices twice restores them to their original state, each τ_D is its own inverse. Second, for two sets D₁ and D₂ the composition τ_D₁ ∘ τ_D₂ flips twice (and hence restores) the indices belonging to both D₁ and D₂ and flips once the indices belonging to exactly one of them. Thus it equals τ_D₁ΔD₂ (D₁ΔD₂ being the symmetric difference of D₁ and D₂) and hence belongs to the class of transformations we are considering.

Next, we will show that it follows from independence and fairness that the transformations τ_D are measure preserving. The logic is simple. Independence lets us delink what happens at a given finite set of indices from what happens at the rest of the indices. Fairness implies that any sequence of outcomes of a given finite length l has the probability 2^−l. So a sequence like HTH and its flipped version THT both have the probability 1/8. So a flip over a finite index set should not change the probability of any event. Now we put in the grunt work to show this formally.

Let ϕ the operation of flipping every outcome in a sequence, ω_D to be the outcomes in ω at indices in D in ω and ω_−D to be the the outcomes in ω at indices in the complement of D. Let S(l) the set of all finite sequences of {H, T} of length equal to l.

Then for any event A and any finite set of indices D of size l, we have by countable additivity, μ(A) = ∑_{s ∈ S(l)}μ(A∩{ω_D=s}) [Here’s why we have to make D finite. If D were infinite the set of sequences of length equal to the length of D would be uncountable and we would not be able to use countable additivity.]

Using independence, we have = ∑_{s ∈ S(l)}μ(ω_D=s) ⋅ μ(ω_−D∈(A∩ω_D=s)_−D)

Using fairness to calculate μ(ω_D=s) = ∑_{s ∈ S(l)}2^−lμ(ω_−D∈(A∩{ω_D=s})_−D)

Using the definition of ϕ (flip all outcomes) and τ_D (flip outcomes in D) we have = ∑_{s ∈ S(l)}2^−lμ(ω_−D ∈ (τ_D(A)∩(ω_D=ϕ(s))_−D)

Now S(l) contains all finite sequence of length l. So as s ranges over S(l), ϕ(s) also ranges over S(l). Therefore, = ∑_{s ∈ S(l)}2^−lμ(ω_−D ∈ (τ_D(A)∩(ω_d=s)_−D)

Once again using fairness to calculate μ(ω_D=s) = ∑_{s ∈ S(l)}μ(ω_D=s) ⋅ μ(ω_−D ∈ (τ_D(A)∩(ω_d=s)_−D)

But by independence, = μ(τ_D(A)) and our proof that τ_D is measure-preserving is done.

The partition

Now we construct a set V whose images under the different τ_D will be a partition of Ω. To do so we first define an equivalence relation on elements of Ω. We say that ω₁ ∼ ω₂ if there is a τ_D with ω₁ = τ_Dω₂. We check that this is really an equivalence relation.

ω ∼ ω [Since τ_∅ is the identity.]
ω₁ ∼ ω₂ ⇒ ω₂ ∼ ω₁. [If ω₁ = τ_Dω₂ then because τ_D is its own inverse it follows that τ_Dω₁ = ω₂.]
ω₁ ∼ ω₂ and ω₂ ∼ ω₃ implies ω₁ ∼ ω₃. [If ω₁ = τ_D₁ω₂ and ω₃ = τ_D₂ω₂ then ω₃ = τ_D₂ ∘ τ_D₁(ω₁) = τ_D₁ΔD₂(ω₁).]

Let V be a set formed by taking one element each from each of the equivalence classes formed by the equivalence relation ∼. Since this selection of an element from each class is arbitrary we have to rely on the Axiom of Choice to ensure that the set V exists.

Now we claim that τ_D(V) as D ranges over all finite sets of indices is a partition of Ω.

First we see that every ω ∈ Ω belongs to some τ_D(V). This is because ω belongs to one of the equivalence classes of ∼. If ω′ is the representative of this class that has been included in V then by the definition of ∼ there is some D such that ω = τ_D(ω′) and hence ω ∈ τ_D(V).

Next we see that if D₁ ≠ D₂ then τ_D₁(V) ∩ τ_D₂(V) = ∅. Suppose on the contrary that τ_D₁ω₁ = τ_D₂ω₂ for some ω₁, ω₂ ∈ V. Using the properties of τ we have τ_D₁ΔD₂(ω₁) = ω₂. By the definition of ∼ this means ω₁ ∼ ω₂ But since V contains exactly one element from each equivalence class this means that ω₁ = ω₂ = ω (say), so that τ_D₁ΔD₂(ω) = ω But since no sequence can remain unchanged if some outcomes are flipped it must be the case that D₁ΔD₂ = ∅ which implies D₁ = D₂ and contradicts our assumption that D₁ ≠ D₂.

Thus we have established that the τ_D(V) form a partition of Ω.

The contradiction

We have from the previous section Ω = ⋃_Dτ_D(V) where the union is disjoint and D ranges over all finite sets of indices. The collection of finite sets of indices is countable (check). Hence we are justified in using countable additivity to get μ(Ω) = ∑_Dμ(τ_D(V))

which because the τ_D’s are measure-preserving gives us μ(Ω) = ∑_Dμ(V)

Now there are two possibilities. μ(V) = 0 in which case we get μ(Ω) = 0, or that μ(V) > 0 in which case we get μ(Ω) = ∞. In either case we have a contradition with the requirement from a probability measure that μ(Ω) = 1.

References

Other proofs of this result can be found in Blackwell and Diaconis (1996) and in Chapter 5 of Oxtoby’s Measure and Category.