1. Coin tossing
Suppose we are given a sequence of independent fair bits (meaning take and with probabily ) we want to produce with them a discrete random variable that takes the values with probabilities . The objective, of course, is to do this using the least possible number of bits.
Let us note that if we consider in binary, this variable is uniformly distributed in . The usual way to produce given a number uniform in is to consider the intervals and declare when . Of course, knowing whether or not, takes a few bits from . Indeed, having seen digits, the only thing we know is that . We denote the number of bits needed by .
1.1. Uniform distribution
The case of the uniform distribution is particularly interesting; it is indeed one of the most common distributions one might come across, but we claim that its study shows some interesting patterns too.
Algorithm 1. We begin with a simplified version that is slightly wasteful.
The idea is that we stop only when is completely contained in one of the intervals .
Why is this wasteful? Well, the probability of the resulting being a dyadic number is 0, so we could directly stop when
thus neglecting the borders. This obviously improves the performance in cases like , in which we end up stopping after exactly steps. The analysis of the algorithm also gets more interesting as we shall see afterwards.
See the corresponding code for our first (and wasteful) algorithm. Here .
x = 0 l = 0 while (true) : x += reveal_bit() / 2^k while (I[l+1] <= x) : l += 1 if (x + 1/2^k < I[l+1] or I[l+1] == 1) : return k k += 1
To start let us recall that
which simplifies the computation of , because telling whether we have to continue is easy: if there is a number of the form in , then we must conclude that . This is so because constitutes the border between two intervals and whenever .
For , it is obviously true that contains a number of the form with , that is not . Indeed, notice that . If then we observe that we may only stop if is , hence in this case .
Assume now that , then there is exactly one number of the form in the interval , which is given by
The set of numbers in starting with those exact first digits is
therefore we conclude that
Since each term of the union is disjoint and so
where denotes the fractional part.
In all we deduce that the redundancy satisfies
where , where denotes the fractional part.
Algorithm 2. In our second algorithm we stop as soon as for some .
In this case we observe that the difference occurs when for some . Of course for some integers , with odd.
This never happens when is odd, therefore the expected value remains the same when is odd. Now assume where is odd. Then and since and are odd we must have and . Now given , what is the smallest for which occurs? Let us remark that, since , all of the digits cannot equal , therefore can be reduced to a number of the form with digits. It follows then that and that occurs as a right border for . Observe that , hence, in fact, this happens during the phase in which and the intervals are longer than the intervals , therefore once we get to the phase , all of the possible dyadics are already available.
Fixed there are choices for our odd (if we allow for the possibility of , whihc we have to discount anyway). Hence in all we can produce dyadics as a right-border.
In all we deduce that the redundancy satisfies
where , where denotes the fractional part and is the greatest such that divides .
1.2. Generic distribution
Algorithm 1 for a generic . In this case we have to work with each individual right-border . As long as , all of the points in will produce a truncation that satisfies , thus we will not have stopped. This accounts for a term in , for each (even for ).
In particular this means that
The other cases are counted in (not necessarily disjoint from the previous count!)
which has measure
Strictly speaking we should count the difference
We prove the bound
Indeed, observe that
and reverse the order of summation to get
In general this means that
and in particular
Lemma 1 Fix , then
where of course . Then
and then the inequality , valid for , proves the result.
Comment. To prove that for , observe that the function is concave and .
As a corollary we get the following Theorem
Furthermore, the in (2) is tight by our example with the uniform distribution.
Concluding remarks. I got the inspiration for this post from reading , chapter 5, where the optimal algorithm for producing a random variable from fair coin tosses is described. I wondered, usually when working with probabilities one uses the procedure that I explained in the post, which more or less corresponds to the so called Inversion method, so how far is it from the optimum? In this post I have proved that it is not that far from being optimal actually. In  it is proved that the optimal procedure satisfies the same bounds for the expected value , but it is not shown that the inequality on the RHS is tight for the optimal procedure.
1. Thomas M. Cover, Joy A. Thomas, Elements of Information Theory. Wiley Series in Telecommunications and Signal Processing, Second Edition, 2006.