A few days ago, Numberphile released a video on YouTube about the famous Look-And-Say sequence, presented by John Conway himself. Just as a reminder (if you have not watched the video, which I thoroughly recommend you do), this sequence begins as follows
and is produced by `reading out loud’ each sequence. For instance, reads as `one one’, and so the result is . Then in turn is read as `two ones’, and so we get , then `one two, one one’ thus , etc. Hence the name look-and-say, or, as John Conway named it `audioactive decay’.
If you like, what we are doing is coding `runs’. That is, we look at the first character , and count how many of these appear in a block, until some different character appears, and we encode , where is the length of the run in decimal notation (with no leading s) and is our character.
If you have watched the video, you might have seen that the length of the elements of the sequence satisfies
which is the root of a polynomial of degree 71, a really remarkable property.
When I shared the video above, Matt suggested looking at the binary variant starting with . This eventually led to a nice discussion on the topic, discussing algorithms to estimate the limit for this case, and for another variant. As a result, in the end, I thought of a proof of analogous properties for the binary case, in fact, these are all significantly simpler than in the case of base .
We are going to define our sequence over the alphabet as follows, first let , and then proceed as above but use base to code the length of the runs. Thus
for instance, to code and produce we have three s followed by one , and .
For simiplicity, we will denote the length of the string resulting from applying the rule above times but with the seed at the beginning. For example , the length of , as we have started with the seed . Also, we will denote the rule above by , so that for each .
We will prove that
where satisfies the equation . As a matter of fact, the constant already appears in OEIS, but not for the reason above. Again, observe that satisfying a polynomial of degree instead of a polynomial of degree is a massive difference.
We will begin by making a few observations which are inspired by the ideas given by John Conway in the video above, in particular, we will try breaking the strings into `atoms’.
Observation 1. Suppose we had an string
where the point is meant to be concatenation, and it is there just to draw attention to that point in the string. Then, the result of applying the above rule (in binary) times to is the same as applying the rule times to each of the two parts and separately, and then concatenating the results.
In other words, we have
Proof: Well, certainly and are not in the same run, so .
Afterwards, we observe that necessarily ends with a to indicate that the last character is a , while always starts with a , since the binary representation of a nonzero number (the number of s in the first run of ) will always start with a . Hence the process repeats itself.
So, using this idea, I first implemented a fast algorithm to compute the lengths for quite large (up to 900), and check that the digits indeed coincided with those in A092526.
The idea is that one can always break up the string into independent fragments of the form (plus a little detail with the last run). Then one runs these sequences independently, and memoizes, that is, remembers their results, becuase each string of the form is going to break up again into smaller such strings.
Next I asked the computer to give me the distinct `atoms’ it found, somewhat surprisingly there were quite few of them, and this lead to the following observation. Observe that here we are assuming that the strings always begin with a , as, if this where not already true for , then it is true for (all lengths in binary start with a ).
Observation 2. We need only look at strings of the form with at most five 0s and at most five s, because eventually (after is applied a number of times that is) any string decomposes into a concatentation of elements of the form
where denotes the `end-of-string’ character.
Proof: First note that the images by of the elements of the form are again concatenations of elements of the same form . Then, by Observation 1, it is enough to show the result for strings that are of either one of the following forms: , . We proceed by induction in the length of the strings. When we have length no more than , the result is trivial. Assume, now, the result to be true for all lengthes less than . And assume we are given a string of the form or that has length .
If with we have . Here the length of the binary representation of in base is , and similarly the length of is . If or , we observe that for , and that for all , and so
and the result follows by using Observation 1, combined with the inductive hypothesis. Else, if and the string satisfies already.
The previous two observations are valid regardless of the initial seed. The trick now, is to note that we may produce a system of equations in by considering the generating functions
that is, the coefficient of is the length of .
When a string decays into with each of the form in , we note that this is implies
indeed, because for .
With this tool at hand, we note that we can produce a system of equations for the elements of the form (taking into account each such element also decays into elements of the same form after applying a few times), solve it, and then get the result by looking at
The procedure explained above can be programmed easily. Now, for the case in which we start with , we will produce a somewhat smaller system by looking at the strings of .
Case starting with . Note that produces the equation
and now we need `info’ on . Again , that is
and so, now we need `info’ on both and . Observe while , and so
The strings and already have their own equations, and so we may just proceed with . We have , thus
But we need to go on with , and get , so
Here had already appeared, and so we continue just with . We get
and we stop here, as no new `atomic’ string has appeared.
In all we have
a system of equations over the field .
Solving (using, say, Sage or Maxima) we get
Here we observe that has real root in the interval , as and . The other two roots are not real (observe that for all ), and so they are conjugates , . By looking at the coefficients of , we have , that is , and so implies that we have .
Now, we prepare for its partial fraction decomposition, we have
for some coefficients . Let us denote . Equation implies that
for . As we remarked above, we have , and so . As for all , this means that (else would tend to ). Moreover, as , we have that
as , and note that implies . Thus we have proved the result.
Remark. As a matter of fact, we can also check that .
Matt proposed the following: You have an infinte stream where each is a random variable taking the value with probability and with the same probability, and the are independent. Suppose that you incrementally apply the `binary run encoding’ as above, does
exist? What is its value?
Edit. According to my computations now, what we have is
I may explain this in another post, it actually follows by using ideas from Analytic Combinatorics. Also, I am pretty sure that converges to the same number, but in probability. For this we should have a look at the variance I guess.
One might be tempted to use the strong law of Large Numbers for , unfortunately there are a few technical details that need to be taken care of.
- Numberphile on YouTube: Look-and-Say Numbers (feat John Conway).
- Sequence A092526, Online Encyclopedia of Integer Sequences: A092526.
- Look-and-say sequence, Wikipedia: here.
- Sage, free open-source mathematics software system. See here.