Binary look-and-say sequences


A few days ago, Numberphile released a video on YouTube about the famous Look-And-Say sequence, presented by John Conway himself. Just as a reminder (if you have not watched the video, which I thoroughly recommend you do), this sequence begins as follows

\displaystyle 1\,,\;11\,,\;21\,,\;1211\,,\;111221\,,\;312211\,,\; 13112221\,,\;1113213211\,,\;\ldots

and is produced by `reading out loud’ each sequence. For instance, {1} reads as `one one’, and so the result is {11}. Then {11} in turn is read as `two ones’, and so we get {21}, then `one two, one one’ thus {1211}, etc. Hence the name look-and-say, or, as John Conway named it `audioactive decay’.

If you like, what we are doing is coding `runs’. That is, we look at the first character {c}, and count how many of these appear in a block, until some different character appears, and we encode {s\, c}, where {s} is the length of the run in decimal notation (with no leading {0}s) and {c} is our character.

If you have watched the video, you might have seen that the length of the elements of the sequence {L_n} satisfies

\displaystyle \lim_{n\rightarrow \infty} \frac{L_{n+1}}{L_n} = \lambda \doteq 1.303577\ldots\,,

which is the root of a polynomial of degree 71, a really remarkable property.

When I shared the video above, Matt suggested looking at the binary variant starting with {0}. This eventually led to a nice discussion on the topic, discussing algorithms to estimate the limit for this case, and for another variant. {(*)} As a result, in the end, I thought of a proof of analogous properties for the binary case, in fact, these are all significantly simpler than in the case of base {10}.

We are going to define our sequence over the alphabet {{\mathbb B}=\{0,1\}} as follows, first let {a_0 = 0}, and then proceed as above but use base {2} to code the length of the runs. Thus

\displaystyle a_0=0\,,\;a_1=10\,,\;a_2=1110\,,\;a_3=11110\,,\;a_4=100110\,,\;a_5=1110010110\,,\;\ldots\,,

for instance, to code {a_2} and produce {a_3} we have three {1}s followed by one {0}, and {(11)_2=3}.

For simiplicity, we will denote {\ell_n(s)} the length of the string resulting from applying the rule above {n} times but with the seed {s} at the beginning. For example {\ell_n(0) = \ell(a_n)}, the length of {a_n}, as we have started with the seed {0}. Also, we will denote the rule above by {f \colon {\mathbb B}^*\rightarrow {\mathbb B}^*}, so that {a_{n+1} = f(a_n)} for each {n\geq 0}.

We will prove that

\displaystyle \lim_{n\rightarrow \infty} \frac{\ell(a_{n+1})}{\ell\left(a_n\right)} = \rho \doteq 1.46557123\ldots\,,

where {\rho} satisfies the equation {\rho^3 - \rho^2 = 1}. As a matter of fact, the constant {\rho} already appears in OEIS, but not for the reason above. Again, observe that satisfying a polynomial of degree {3} instead of a polynomial of degree {71} is a massive difference.

We will begin by making a few observations which are inspired by the ideas given by John Conway in the video above, in particular, we will try breaking the strings into `atoms’.

Observation 1. Suppose we had an string

\displaystyle s = r0\cdot 1t\,,

where the point is meant to be concatenation, and it is there just to draw attention to that point in the string. Then, the result of applying the above rule (in binary) {n} times to {s} is the same as applying the rule {n} times to each of the two parts {r0} and {1t} separately, and then concatenating the results.

In other words, we have

\displaystyle f^n(s) = f^n(r0)\cdot f^n(1t)\,,

where {f^n = \underbrace{f\circ f\circ \ldots \circ f}_{n\text{ times}}}.

Proof: Well, certainly {0} and {1} are not in the same run, so {f(s) = f(r0)\cdot f(1t)}.

Afterwards, we observe that {f(r0)} necessarily ends with a {0} to indicate that the last character is a {0}, while {f(1t)} always starts with a {1}, since the binary representation of a nonzero number (the number of {1}s in the first run of {1t}) will always start with a {1}. Hence the process repeats itself. \Box

So, using this idea, I first implemented a fast algorithm to compute the lengths for quite large {n} (up to 900), and check that the digits indeed coincided with those in A092526.

The idea is that one can always break up the string into independent fragments of the form {1^+0^+} (plus a little detail with the last run). Then one runs these sequences independently, and memoizes, that is, remembers their results, becuase each string of the form {1^+0^+} is going to break up again into smaller such strings.

Next I asked the computer to give me the distinct `atoms’ it found, somewhat surprisingly there were quite few of them, and this lead to the following observation. Observe that here we are assuming that the strings always begin with a 1, as, if this where not already true for s, then it is true for f(s) (all lengths in binary start with a 1).

Observation 2. We need only look at strings of the form {1^*0^*} with at most five 0s and at most five {1}s, because eventually (after {f} is applied a number of times that is) any string decomposes into a concatentation of elements of the form

\displaystyle 1^{1\leq \cdot \leq 5}0^{1\leq \cdot \leq 5}\,,\qquad 1^{1\leq \cdot \leq 5}\Omega\,,\qquad (**)

where {\Omega} denotes the `end-of-string’ character.

Proof: First note that the images by {f} of the elements of the form {(**)} are again concatenations of elements of the same form {(**)}. Then, by Observation 1, it is enough to show the result for strings that are of either one of the following forms: {1^+0^+}, {1^+}. We proceed by induction in the length of the strings. When we have length no more than {5}, the result is trivial. Assume, now, the result to be true for all lengthes less than {L > 5}. And assume we are given a string {s} of the form {1^+0^+} or {1^+} that has length {L}.

If {s = 1^n 0^m} with {n+m=L > 5} we have {f(s) = (n)_{\text{rep in base }2} 1 (m)_{\text{rep in base }2} 0}. Here the length of the binary representation of {n} in base {2} is {\lfloor \log_2 \,n\rfloor+1}, and similarly the length of {(m)_{\text{rep in base }2}} is {\lfloor \log_2 \,m\rfloor+ 1}. If {n > 5} or {m> 5}, we observe that {\lfloor \log_2 k\rfloor + 1\leq k - 3} for {k\geq 6}, and that {\lfloor \log_2(k)\rfloor + 1 \leq k} for all {k\geq 1}, and so

\displaystyle \ell(f(s)) \leq (m+1)+(n+1)-3 = L-1 \,,

and the result follows by using Observation 1, combined with the inductive hypothesis. Else, if {n\leq 5} and {m\leq 5} the string {s} satisfies {(**)} already. \Box

The previous two observations are valid regardless of the initial seed. The trick now, is to note that we may produce a system of equations in {{\mathbb Q}[[x]]} by considering the generating functions

\displaystyle F_s(x) = \sum_{k\geq 0} \ell_k(s)\,x^k\,,

that is, the coefficient of {x^k} is the length of {f^k(s)}.

When a string {s} decays into {f^n(s)= s_1\cdot s_2\cdot \ldots s_m} with each {s_i} of the form in {(**)}, we note that this is implies

\displaystyle F_s(x) = \ell(s) + x\, \ell(f(s)) + \ldots + x^{n-1}\,\ell(f^{n-1}(s)) + x^n\, F_{s_1}(x) + \ldots + x^n\, F_{s_m}(x)\,,

indeed, because {\ell(f^k(s)) = \sum_{i=1}^m \ell(f^{k-n}(s_i))} for {k \geq n}.

With this tool at hand, we note that we can produce a system of equations for the elements of the form {(**)} (taking into account each such element also decays into elements of the same form after applying {f} a few times), solve it, and then get the result by looking at

\displaystyle F_0(x) = 1 + x\, F_{10}(x)\,.

The procedure explained above can be programmed easily. Now, for the case in which we start with {0}, we will produce a somewhat smaller system by looking at the strings of {(a_k)_k}.

Case starting with {0}. Note that {10\mapsto 1110} produces the equation

\displaystyle F_{10}(x) = 2 + x\,F_{1110}(x)\,,

and now we need `info’ on {1110}. Again {1110 \mapsto 100\cdot 110}, that is

\displaystyle F_{1110}(x) = 4 + x\,F_{100}(x)+x\, F_{110}(x)\,,

and so, now we need `info’ on both {100} and {110}. Observe {100\mapsto 11100} while {110\mapsto \underline{10}\cdot \underline{110}}, and so

\displaystyle F_{100}(x) = 3 + x\,F_{11100}(x)\,,\qquad F_{110}(x) = 3 + x\, F_{10}(x) + x\, F_{110}(x)\,.

The strings {10} and {110} already have their own equations, and so we may just proceed with {11100}. We have {11100\mapsto 111100}, thus

\displaystyle F_{11100}(x) = 5 + x\,F_{111100}(x)\,.

But we need to go on with {11100}, and get { 111100\mapsto \underline{100}\cdot 1100}, so

\displaystyle F_{111100}(x) = 6 + x\,F_{100}(x)+x\, F_{1100}(x)\,.

Here {100} had already appeared, and so we continue just with {1100}. We get {1100\mapsto \underline{10}\cdot \underline{1100}}

\displaystyle F_{1100}(x) = 4 + x\, F_{10}(x) + x\, F_{1100}(x)\,,

and we stop here, as no new `atomic’ string has appeared.

In all we have

\displaystyle \left\{\begin{matrix} F_{10}(x) &=& 2 &+& x\, F_{1110}(x) \\ F_{1110}(x) &=& 4 &+& x\, F_{11110}(x) \\ F_{11110}(x) &=& 5 &+& x\, F_{100}(x) &+& x\, F_{110}(x) \\ F_{100}(x) &=& 3 &+& x\, F_{11100}(x) \\ F_{110}(x) &=& 3 &+& x\, F_{10}(x) &+& x\, F_{110}(x)\\ F_{11100}(x) &=& 5 &+& x\, F_{111100}(x) \\ F_{111100}(x) &=& 6 &+& x\, F_{100}(x) &+& x\, F_{1100}(x)\\ F_{1100}(x) &=& 4 &+& x\, F_{10}(x) &+& x\, F_{1100}(x) \,, \end{matrix}\right.

a system of {8\times 8} equations over the field {{\mathbb Q}(x)}.

The graph associated to the 'decay' process when we start with 10.

A digraph representation of the process when started at 10. An edge (v,w) indicates that w is one of the independent factors that make up f(v).

Solving (using, say, Sage or Maxima) we get

\displaystyle F_{10}(x) = \frac{{x}^{3}-{x}^{2}-2\,x-2}{{x}^{3}+x-1}\,,

and so

\displaystyle F_{0}(x) = 1 + x\,F_{10}(x) = \frac{{x}^{4}-2\,{x}^{2}-x-1}{{x}^{3}+x-1}\,.

Here we observe that {p(x) = x^3 + x - 1} has {1} real root {\theta} in the interval {(0,1)}, as {p(0)=-1} and {p(1)=1}. The other two roots are not real (observe that {p^\prime(t)\geq 1} for all {t\in{\mathbb R}}), and so they are conjugates {z}, {\bar{z}}. By looking at the coefficients of {p(x)}, we have {\theta\,z\,\bar{z} = 1}, that is {\theta \, |z|^2 = 1}, and so {\theta \in (0,1)} implies that we have {|z| > 1}.

Now, we prepare {F_0(x)} for its partial fraction decomposition, we have

\displaystyle F_0(x) = \frac{-3\,{x}^{2}-1}{{x}^{3}+x-1}+x\,,

and here

\displaystyle \frac{-3\,{x}^{2}-1}{{x}^{3}+x-1} = \frac{A_z}{1-x/z} + \frac{B_z}{1-x/\bar{z}} + \frac{C_\theta}{1-x/\theta}\,,\qquad (***)

for some coefficients {A_z,B_z,C_\theta \in {\mathbb C}}. Let us denote {\rho := 1/\theta}. Equation {(***)} implies that

\displaystyle \ell(a_n) = A_z\,z^{-n} + B_z\, {\bar{z}}^{-n} + C_\theta \rho^n\,,

for {n>1}. As we remarked above, we have {|z| > 1}, and so {A_z\,z^{-n} + B_z\, {\bar{z}}^{-n} \rightarrow 0}. As {\ell(a_n) \geq 1} for all {n}, this means that {C_\theta \neq 0} (else {\ell(a_n)} would tend to {0}). Moreover, as {\rho > 1}, we have that

\displaystyle \ell(a_n) \sim C_\theta\, \rho^n\,,

as {n\rightarrow \infty}, and note that {(1/\rho)^3+(1/\rho)-1 = 0} implies {\rho^3 - \rho^2 - 1 = 0}. Thus we have proved the result.

Remark. As a matter of fact, we can also check that C_\theta = \rho.

{(*)} Matt proposed the following: You have an infinte stream {X_1,X_2,X_3,\ldots} where each {X_i} is a random variable taking the value {0} with probability {1/2} and {1} with the same probability, and the {X_i} are independent. Suppose that you incrementally apply the `binary run encoding’ {f} as above, does

\displaystyle \lim_{t \rightarrow \infty} \frac{1}{t}\,\ell(f(X_1\ldots X_t))

exist? What is its value?

Edit. According to my computations now, what we have is

\displaystyle  \lim_{t \rightarrow \infty} \frac{1}{t}\,{\mathbb E}[\ell(f(X_1\ldots X_t))] = \frac{5}{4} + \sum_{j=2}^\infty \frac{1}{2^{2^j}}\doteq 1.31642150902189\ldots\,.

I may explain this in another post, it actually follows by using ideas from Analytic Combinatorics. Also, I am pretty sure that {\lim_{t \rightarrow \infty} \frac{1}{t}\,\ell(f(X_1\ldots X_t))}  converges to the same number, but in probability. For this we should have a look at the variance I guess.

One might be tempted to use the strong law of Large Numbers for (*), unfortunately there are a few technical details that need to be taken care of.

References

  1. Numberphile on YouTube: Look-and-Say Numbers (feat John Conway).
  2. Sequence A092526, Online Encyclopedia of Integer Sequences: A092526.
  3. Look-and-say sequence, Wikipedia: here.
  4. Sage, free open-source mathematics software system. See here.
Posted in Combinatorics, Computation, Generating Functions | Leave a comment

Generating a random permutation


It is well-known that any permutation of {[N] := \{1,\ldots,N\}} can be written as a product of transpositions, that is, of permutations of the form {(i\,j)} with {i,j\in [N]}, where we recall that {(i\,j)} is the permutation \sigma\in S_N satisfying \sigma(i) = j,\;\sigma(j)=i and \sigma(k)=k for k\neq i,j. In this post however, we will do the opposite. We will start with the identity permutation {id} and compose transpositions uniformly at random. How long do we have to go before any given permutation is roughly equally likely to appear?

Remark 1 Observe that this method is equivalent to having all of the elements from {1} through {n} layed out on an array, and then randomly swapping a pair of them. It is natural to think that, if we keep on going long enough, the resulting permutation is going to be a (uniformly) `random’ one.

// initialize the array, but we will work with 0...N-1 instead.
for (int i = 0; i < N; i++)
    a[i] = i;
// for a number of iterations k that should depend on N
for (int i = 0; i < k; i++)
{
    int x = random(0,N-1); // pick an element uniformly at random from {0,...,N-1}
    int y = random(0,N-1); // independently
    swap(a,x,y); // swap positions x and y in a.
}
return a;

However, this raises a few questions: Is it truly `random’? For how long do we have to keep on swapping elements?

This process can be modeled as follows

\displaystyle X_0 := id,\, X_1 := (A_1\;B_1) \circ X_0,\,\ldots,\,X_{k+1}:= (A_{k+1}\;B_{k+1})\circ X_k\,,

and so on, where all the pairs {(A_i,B_i)} are independent random variables, such that

\displaystyle  \Pr \{ (A_i,B_i) = (i,j)\} = \frac{1}{N^2}\,,\qquad (i,j)\in [N]\times [N]

What we want to know is

  • Given {\sigma \in S_N}, does {\Pr\{ X_k = \sigma\} \rightarrow 1 / N!} as {k\rightarrow\infty}?
  • If that were indeed the case, how fast is the convergence?

We will start with the first question. For this, we will model the problem with a markov chain. The state space will be {S_N}, now observe that given {\pi\in S_N}, the probability of moving to {\pi^\prime \in S_N} for the following iteration is

  • {p_{\pi,\pi^\prime} = \frac{1}{N}} if {\pi = \pi^\prime}. This follows from the fact that we have {N} ways to generate the identity, by choosing a trasposition of the form {(i\, i)}.
  • {p_{\pi,\pi^\prime} = \frac{2}{N^2}} if {\pi^\prime = (i\; j)\circ \pi} for some trasposition {(i\; j)\neq id}. Notice that this transposition is uniquely determined by {\pi^\prime \circ \pi^{-1}}, hence giving the previous probability as {(i\; j)} corresponds to the pairs {(i,j)} and {(j,i)}.
  • {p_{\pi,\pi^\prime} = 0} otherwise.

So, the transition probability matrix {A} is constant, i.e. does not depend on the number of iterations we have already gone through.

Next, since we can always go from {\pi} to {\pi}, we observe that the markov chain is aperiodic, and, since any permutation can be written as a product of transposition, we get that the markov chain is also irreducible. It follows then (classic result in Markov Chains), that this markov chain has a unique stationary distribution, and that from any given initial distribution (starting at {id} with probability {1} for instance), we end up converging to the limit distribution {\mu} Now, what is this probability distribution {\mu}?

To check that {\mu} is indeed the uniform distribution, we check that the uniform distribution is indeed stationary. Given the distribution {\mu}, the probability of being on a particular state {\pi\in S_N} for the next iteration is

\displaystyle  \tfrac{1}{N} \, \mu_\pi + \sum_{i < j} \tfrac{2}{N^2}\, \mu_{(i\, j) \circ \pi }\,. \quad (*)

Indeed, the first term corresponds to the transpositions {(i\, i)}, while the second sum ranges over {(i\, j)} with {i < j}. If by applying {(i\,j)} we end up having {\pi}, the previous permutation was necessarily {(i\,j)\circ \pi}.

By letting {\mu_\pi = 1/N!} for all {\pi \in S_N}, it follows from {(*)} that the probability of having {\pi} in the next iteration is

\displaystyle  \frac{1}{N} \, \frac{1}{N!} + \frac{2}{N^2}\, \frac{1}{N!}\, \binom{N}{2} = \frac{1}{N!}\,,

showing that this {\mu} is indeed the stationary distribution.

Great, now we know that the distribution of {X_k} tends to {\mu} as {k\rightarrow \infty}, but how fast does it converge? With {k < N/2} we are far from having a uniform distribution since the resulting permutation must have at least one fixed point. Indeed there is at least one element {i\in [N]} that does not turn up in any of the transpositions we have applied so far. So for instance

\displaystyle  \Pr\{ X_k = (1\,2\,3\,\ldots\,N)\} = 0\,,

whenever {k < N/2}, where we have used cycle notation.

So, actually, what we have seen is that the diameter of the graph is at least {N/2}, but that still leaves out the question of the speed of convergence.

An easier question is the following: how close are we to having

\displaystyle  \Pr\{ X_k(i) = j\} = 1 / N\,.

In this case it is not difficult to compute { \Pr\{ X_k(i) = j\}} explicitly when {i\neq j}. Observe that { \Pr\{ X_{k+1}(i) = j\}} equals

\displaystyle  \Pr\{ j\not\in \{A_{k+1},B_{k+1}\}\text{ or } A_{k+1}=B_{k+1}=j\} \, \Pr\{ X_{k}(i) = j\} + \sum_{j^\prime \neq j}\Pr\{\{A_{k+1},B_{k+1}\} = \{j^\prime,j\}\}\, \Pr\{ X_{k}(i) = j^\prime\} \,.

Note that

\displaystyle \Pr\{\{A_{k+1},B_{k+1}\} = \{j^\prime,j\}\} = \frac{2}{N^2} =: \alpha

for all {j^\prime \neq j}, and

\displaystyle \Pr\{ j\not\in \{A_{k+1},B_{k+1}\}\text{ or } A_{k+1}=B_{k+1}=j\} = \frac{(N-1)^2}{N^2} + \frac{1}{N^2} =: \beta\,,

hence, if we denote, by symmetry, {p_{k} := \Pr\{ X_k(i) = j\}} when {j\neq i}, we get

\displaystyle  p_{k+1} = \beta \, p_k + \alpha\,(1-p_k) = (\beta-\alpha) p_k + \alpha.

Here, since {p_0 = 0}, it follows that

\displaystyle  p_k = \alpha \times \left( (\beta-\alpha)^{k-1} + \ldots + (\beta-\alpha)^0\right) = \alpha \times \frac{1 - (\beta-\alpha)^k }{1 - (\beta-\alpha)}\,.

Then

\displaystyle p_k = \frac{1}{N} \times \left(1 - \left(1-\frac{2}{N}\right)^k\right)\,,

and by letting {k = \lambda\, N} we get

\displaystyle p_k \approx \frac{1}{N} \times \left(1 - e^{-2\,\lambda}\right)\,,

as {N\rightarrow \infty}.

So, for instance, this tells us that, when {\lambda = 1/2}, we get

\displaystyle  \Pr\{X_k(i) = j\} \approx \frac{1}{N} \times \left(1 - e^{-1}\right) \approx 0.63 / N\,,

for {i\neq j}, meaning that {i} is, with probability roughly {e^{-1}}, a fixed point. Of course, this is an unreasonably large number, as the probability of {i} being a fixed point if {X_k} were uniformly distributed would have to be {1/N}.

Looking at these probabilities for {i \neq j}, that is

\displaystyle p_k \approx \frac{1}{N} \times \left(1 - e^{-2\,\lambda}\right)\,,

a choice of {\lambda \geq 3} would seem reasonable. However, looking at those for {i= j}, which are given by

\displaystyle  \Pr\{X_k(i) = i\} = \frac{1}{N} + \frac{N-1}{N}\, \left(1-\frac{2}{N}\right)^k\,,

we see that something more interesting is going on.

If the maximum relative error is {\epsilon_{\max} := \max\Big\{\frac{\big|\Pr\{X_k=\pi\}-1/N!\big|}{1/N!} : \pi\in S_N\Big\}}, then

\displaystyle  \Pr\{X_k(i)=i\} = \sum_{\pi\in S_N\,,\;\pi(i)=i}\Pr\{X_k=\pi\} \leq \sum_{\pi\in S_N\,,\;\pi(i)=i} \frac{1+\epsilon_{\max}}{N!} = \frac{1+\epsilon_{\max}}{N}\,,

and, since we know that

\displaystyle  \Pr\{X_k(i) = i\} = \frac{1}{N} + \frac{N-1}{N}\, \left(1-\frac{2}{N}\right)^k\,,

we get

\displaystyle  \boxed{(N-1)\,\left(1-\frac{2}{N}\right)^k \leq \epsilon_{\max}\,.}

Thus, in order to have \epsilon_{\max} \leq \delta, that is, to have the relative error bounded by a constant \delta, it is necessary to have at least

\displaystyle  k \geq \frac{\log(N-1) + \log\left(\delta^{-1}\right)}{\log\left(\frac{1}{1-2/N}\right)} = \frac{1}{2}\,N\,\log\,N + \frac{N}{2}\,\log\left(\delta^{-1}\right) - \frac{1}{2}\,\log\,N +O(1)

iterations (and maybe not sufficient, this is a lower-bound), where the constant of the term O(1) actually depends on \log\, \delta.

Some concluding remarks:

Remark 2 As we have mentioned, the method to generate permutations that we discuss above is actually a bit crude. In fact, we can do better as follows:

// initialize the array, but we will work with 0...N-1 instead.
for (int i = 0; i < N; i++)
    a[i] = i;
// for each position choose one of the not-chosen elements
for (int i = 0; i < N; i++)
{
    int pos = random(i,N-1); // pick an element uniformly at random from {i,...,N-1}
    swap(a,i,pos); // swap positions i and pos in a.
}
return a;

Remark 3 The construction above is quite general. If we consider a generator set {S \subseteq S_n}, the subgroup {G = \langle S \rangle} can be sampled uniformly in the limit by taking

\displaystyle  X_0 := id\,,\qquad X_{k+1} := A_{k+1} \circ X_k\,,

where the random variables {A_1,A_2,\ldots} are independent and uniformly distributed, with distribution

\displaystyle  \Pr\{A_i = id\} = \tau\,,\qquad \Pr \{ A_i = s\} = \frac{1-\tau}{|S|}\text{ for each }s\in S\,,

where {\tau \in (0,1)} is a free (fixed) parameter, which actually affects the rate of convergence.

Again, given a distribution {\mu} over {G}, the distribution for the next iteration {\nu} is

\displaystyle \nu_\pi := \tau\,\mu_\pi + \sum_{s\in S}\frac{1-\tau}{|S|}\,\mu_{s^{-1}\circ \pi}\,.

It is clear again that taking {\mu = \nu = \text{ `uniform on }G\text{'}}, the previous equation is satisfied. As the resulting Markov chain is irreducible (when considering {G}) and aperiodic (as we have {\Pr\{X_i = X_{i+1}\} \geq \tau > 0}), this is the limiting distribution.

Matters of convergence become even more difficult however, as we do not even know how large {G} is going to be.

Posted in Combinatorics, Computation, Probability | 1 Comment

Finite fields and trinomials


In this post we shall prove that {f(X) = X^{2 \times 3^k} + X^{3^k} + 1 \in {\mathbb F}_2[X]} is irreducible in {{\mathbb F}_2[X]}, thus proving there are infinitely many irreducible trinomials over {{\mathbb F}_2}. Maybe my proof is overcomplicated, so if you have a simpler one, go ahead and comment.

First remember that finite fields are of the form {F={\mathbb F}_n} with {n=p^k} for some prime {p} and {k\in{\mathbb Z}^+}, and that {F/K} is Galois, where {K={\mathbb F}_p}.

Proposition 1 If {F = {\mathbb F}_n} is a finite field with {n} elements, and characteristic {p}. Then {n = p^k} for some {k\in{\mathbb Z}^+}.

Proof: If {n} had two prime divisors, then we have a prime divisor {q\neq p = {\text char}\left( F\right)}. Now, seen as an additive group, {F} must have a subgroup of order {q} by Cauchy’s Theorem, and thus we must have an element {g\in F} such that {g} has order {q}. However, {p\cdot g = 0_F} too, since {p = {\text char}\left( F\right)}, and this is nonsense since {p\neq q} and so {q\not | p}. \Box

Remark 1 In fact, we will actually have {[{\mathbb F}_{p^k} : {\mathbb F}_p] = k}. This follows from the fact that if we have an extension {F/K}, then {F} is a vector space over {K}.

Proposition 2 Let {F = {\mathbb F}_{p^k}} be a finite field with {p^k} elements and characteristic {p}. Then {F} is unique up to isomorphism (thus the notation{\ldots}), and {F/K} is Galois where {K={\mathbb F}_p}.

Proof: Note that {F={\mathbb F}_{p^n}} must be exactly the splitting field (splitting fields are isomorphic) of {X^{p^n}-X\in {\mathbb F}_p}, in fact, the set of roots forms the field {F} {(*)}. Since that polynomial only has simple roots, being {F} the splitting field of a separable polynomial over {K}, it follows that {F/K} is Galois. \Box

For {(*)}, note that {(x+y)^p = x^p+y^p} when we are working over a field of characteristic {p}. This follows from the Binomial Theorem, and the fact that {p} is prime.

Finally, we are going to characterize the group {{\text{Gal}}(F/K)} where {F={\mathbb F}_{p^k}} and {K={\mathbb F}_p}.

Proposition 3 Let {F = {\mathbb F}_{p^n}}, where {p} is a prime, and {K={\mathbb F}_p}. Let us define {\Gamma : F\rightarrow F} by

\displaystyle \Gamma (a) := a^p

then we have

\displaystyle {\text{Gal}}(F/K) = \left\langle \Gamma \right\rangle

Proof: From {(x+y)^p=x^p+y^p} for all {x,y\in F}, and the fact that {\Gamma} is clearly multiplicative, we have that {\Gamma} is a morphism, since {\Gamma(1) = 1}. By Fermat’s Little Theorem, we have that {\Gamma} is constant over {K}, thus {\Gamma \in {\text{Gal}}(F/K)}.

Since {F} is finite, it follows that {F^{\times}} is cyclic, thus it contains an element {\alpha} of order {p^n - 1}. Note then that {\alpha\neq\Gamma^{k}(\alpha) = \alpha^{p^k}} for all {0<k<n}, but {\alpha = \Gamma^n(\alpha)}. Thus we find that {\left|\left\langle \Gamma \right\rangle\right| = n}.

Finally, since {F/K} is Galois, it follows from the Funamental Theorem that {\left|{\text{Gal}}(F/K)\right|= [F:K] = n}. \Box

Remark 2 Clearly the previous proposition generalizes to the case where {F = {\mathbb F}_{p^n}} and {K = {\mathbb F}_{p^m}} is a subfield of {F} (we must have then {m|n}, since {{\mathbb F}_{p^m}^\times \leq {\mathbb F}_{p^n}^\times}). In which case {{\text{Gal}}(F/K) = \left\langle \Gamma^m \right\rangle}, since {K} is the subfield characterized by the fact that their elements correspond exactly to the roots of {X^{p^m} - X}.

Remark 3 The map {\Gamma} is usually called the Frobenius map.

As a reminder, we have the following useful proposition:

Proposition 4 Let {F/K} be a finite Galois extension. Let {\alpha\in F}, then we have

\displaystyle irred_K (\alpha) = \prod_{s\in S} (X-s)

where {S = \{\sigma(\alpha) : \sigma \in {\text{Gal}}(F/K)\}} is the set of Galois conjugates of {\alpha}.

Proof: Since {F/K} is finite, {\alpha} must be algebraic over {K}. Let {irred_K(\alpha) = m(X)\in K[X]} be its minimal polynomial over {K}. Since {{\text{Gal}}(F/K)} must permute roots of {m(x)}, it follows that {S} contains only distinct roots of {m(x)}, thus it follows that {g(X) = \prod_{s\in S} (X-s)} must divide {m(x)} over {F[X]}. So we will be done if we can prove that {g(X)\in K[X]}.

Let {\pi\in {\text{Gal}}(F/K)}, note then that {\pi g ( X) = \prod_{s\in S} (X-\pi(s))} but {\pi(\sigma(\alpha)) = (\pi\circ\sigma)(\alpha)} and {\{\pi\circ\sigma : \sigma\in{\text{Gal}}(F/K)\} = {\text{Gal}}(F/K)} … thus {\sigma(S) \subseteq S}. But {\pi} is injective and {|S|<\infty}, thus {\sigma(S) = S} and so

\displaystyle \pi g ( X) = g(X)

and so its coefficients must be fixed by every {\pi\in {\text{Gal}}(F/K)}, being {F/K} Galois, it follows then that {g(X)\in K[X]}. \Box

Now we are almost done, we will move on to proving that {f(X) = X^{2 \cdot 3^k}+X^{3^k} + 1\in {\mathbb F}_2[X]} is irreducible.

Proof: Note that {f(X) = \frac{X^{3^{k+1}} - 1}{X^{3^k} - 1}}, and so, being that {X^{3^r}-1} has only simple roots (its formal derivative is {X^{3^{r}-1}\in {\mathbb F}_2}) for each {r}, it follows that the roots of {f(X)} in a splitting field {F} must be exactly the elements satisfying {\alpha^{3^{k+1}} = 1} but {\alpha^{3^k}\neq 1}.

Now let {\alpha \in F} be an element such that {\alpha^{3^{k+1}} = 1} but {\alpha^{3^k}\neq 1}, out of the {3^{k+1}-3^k = 2\times 3^k} possible ones. We will be done if we can show its irreducible polynomial over {K = {\mathbb F}_2} has degree {N = 2\times 3^k}.

Remember now that {{\text{Gal}}(F/K) = \left\langle \Gamma\right\rangle}, and so by the previous proposition we find that we will be done if we can show that {\alpha, \alpha^2,\ldots,\alpha^{2^{N-1}}} are all distinct (note that {\alpha^{2^N} = \alpha}, since {2^N = 2^{\phi(3^{k+1})}\equiv 1 (\bmod. 3^{k+1})} and {\alpha\in F} has order {3^{k+1}}). Since {\alpha} has order {3^{k+1}}, this is the same as showing that the order of {2} module {3^{k+1}} must be {N = 2\times 3^k}.

To finish off the problem then, we shall prove a Lemma. \Box

Lemma 5 Let {k\in\{1,2,\ldots\}}, then we have {2^{2\times 3^{k-1}} = 3^k \times q + 1} where {3\not | q}.

Proof: By induction, the result is obvious for {k = 1} by picking {q=1}.

Now let {n \geq 2} and assume the result to be true for {k < n}. For {n} we have, by the inductive hypothesis

\displaystyle 2^{2\times 3^{n-1}} = \left(2^{2\times 3^{n-2}}\right)^3 = \left(3^{n-1}\times q + 1\right)^3

where {3\not | q}. Thus, expanding we get

\displaystyle 2^{2\times 3^{n-1}} = \left(3^{n-1}\times q\right)^3 + 3 \left(3^{n-1}\times q\right)^2 + 3 \left(3^{n-1}\times q\right) + 1

and here note that

\displaystyle \left(3^{n-1}\times q\right)^3 + 3 \left(3^{n-1}\times q\right)^2 + 3 \left(3^{n-1}\times q\right) = 3^n\times q^\prime

where {q^\prime = 3^{2n-3} q^3 + 3^{n-1} q^2 + q}, thus since {3\not | q} we get that {3\not | q^\prime}. \Box

Now this proves that the order of {2} module {3^{k+1}} must be exactly {2\times 3^k}.

Proof: Since {2} is coprime to {3^{k+1}}, we get {2^{\phi(3^{k+1})} = 2^{2\times 3^k}\equiv 1 (\bmod. 3^{k+1})}. Thus {n = {\text ord}_{3^{k+1}}(2)} must divide {2\times 3^k}. If {n < 2\times 3^k}, then we must have that either {2^{3^k}\equiv 1 (\bmod. 3^{k+1})} or {2^{2\times 3^{k-1}} \equiv 1 (\bmod. 3^{k+1})}.

The former option is impossible since {2^{3^k} \equiv (-1)^{3^k} = -1 (\bmod. 3)}. For the latter, we note that we have {2^{2\times 3^{k-1}} = 3^k q + 1} where {3 \not | q} , by the Lemma, so {3^k q + 1 = 3^{k+1}\times m + 1} is impossible since {3 \not | q}. \Box

Comment: The previous exercise appears as part of a note about Golomb’s conjecture in this book . Golomb’s conjecture states that there are infinitely many irreducible trinomials over {{\mathbb F}_2} that are primitive (which is a lot stronger than what we proved), I haven’t looked much into it, but as of 2005 the conjecture was still open.

Posted in Algebra, Number Theory | Tagged | Leave a comment

Looking for the largest palindrome



Exercise 1 Let a string of characters {s} be given, find the length of the largest palindrome contained in it.

Let us start with the naive way of doing it, what would that be? Well, for each position, and possible length, we check if we have a palindrome of that length, starting at that position. The naive algorithm then would run in {\Theta\left(N^3\right)} where {N} is the length of the string.

But truth is, we can do much better. Think about it for a minute, palindromes always contain some smaller palindromes (unless they consist of {0} or {1} character), and as such, we can re-use information about palindromes of smaller length than we are currently checking. Thus in all, we can do the following dynamic programming algorithm

int ans = 1;
// length 1 or less are immediately palindromes
memset(dp,true,sizeof(dp));
// for each length in increasing order
for (int l = 2; l <= N; l++)
    // check for palindromes of that length
    for (int i = 0; i <= N - l; i++)
        if (s[i] == s[i+l-1] && dp[i+1][i+l-2] ) 
        {
            // s[i],...,s[i+l-1] is a palindrome
            dp[i][i+l-1] = true;
            ans = l;
        }
        else dp[i][i+l-1] = false;
return ans;

Note that {dp(i,j)} remembers whether {s(i\ldots j)} is a palindrome or not. Note too, that the fact that we go over the substrings in order according to their lengths is crucial. Finally, it is clear that we have found all of the palindromes within s, not only the largest one.

In all, this algorithm runs in {\Theta\left(N^2\right)} time, and {\Theta\left(N^2\right)} memory as well. A vast improvement over the naive method. However, it is interesting to see that we can still do better (probably)!

Exercise: Try thinking how this algorithm can be modified so that it consumes barely an extra {\Theta\left(N\right)} memory.

Next I am going to present the solution I first thought about when I saw the problem.

Suppose that the maximum length we are seeking is {M}. If this meant that for any {k\leq M} there is a palindrome with length {k}, we would be able to apply binary search. However, this is not quite true, but it is close to the truth! Suppose {M} were even, i.e. {M = 2 n} for some {n}, then there is a palindrome of length {2\cdot k} for any {k \leq n} .. and similarly for the odd case. Indeed if {s(i)\ldots s(i+2n-1)} were our palindrome of length {M=2\cdot n}, then {s(i+1)\ldots s(i+2n-2)} which has length {2n-2}, is also a palindrome, and so on.

So the core of the idea is to do 2 binary searches, one for even palindromes, and one for odd, and then pick the largest. Once we have fixed a length {l}, and we want to know if there is a palindrome of that length, this can be checked in roughly {l \times (N-l+1)} operations by doing a naive check if there is a palindrome of that length (we short-circuit once we find one).

Since {l \times (N-l+1) \leq \left(\frac{N+1}{2}\right)^2 } it follows that this algorithm does, roughly, no more than {\log ( N) \times N^2 / 2} operations (counting both binary searches). Thus we have a {O\left( N^2 \times \log (N) \right)} algorithm, slightly worse than before, but consumes no extra memory.

However, there is a trick we can use to enhance the naive check for the subtrings! Indeed we can do some hashing along with a window sliding trick. The idea is that we may, given a base b and some number {n>256} (the number of characters) coprime to b, map say {s(i)\ldots s(i+2l)} into

\displaystyle -b^{l-1} \cdot s(i) + \ldots - b^1 \cdot s(i+l-1) + b^1 \cdot s(i+l+1) + \ldots b^{l-1} s(i+2l) (\bmod. n)

and check whether this is 0 (the length here is odd, you may want to think what happens when it is even). Let L_{2l}(i) := b^{l-1} \cdot s(i) + \ldots + b^1 \cdot s(i+l-1)(\bmod. n) and U_{2l}(i) := b^1 \cdot s(i+l+1) + \ldots b^{l-1} s(i+2l)(\bmod. n).

Of course, checking this for the first time takes O(l) time, but note that we have

U_{2l}(i+1) = b^{-1}\cdot U_{2l}(i) - s(i+l-1) + b^{l-1}\cdot s(i+1+2l)  (\bmod. n)

and

L_{2l}(i+1) = b\cdot L_{2l}(i) - b^{l} \cdot s(i) + b\cdot s(i+l)  (\bmod. n)

and so the transitions take O(1) (you compute the powers of b only once, so this is not counted in the transitions). Only if L_{2l}(i) = U_{2l}(i) , do we check in the naive way whether s(i\ldots i+2l) is a palindrome. The point is that, if the hashing were very good, we can potentially have a O(N\times \log N) algorithm (of course, the actual worst case might be an order of N more, but we expect it to be uncommon), consuming no extra memory.

Posted in Algorithms | Leave a comment

Cyclotomic Polynomials


Okay, here is a proof of a particular case of Dirichlet’s Theorem on Arithmetic progressions. As a preliminary, we will begin by looking at the cyclotomic polynomials.

Definition 1 Given {n\in{\mathbb Z}^+} we define {\Phi_n(X) \in {\mathbb{C}}[X]}, the {n}-th cyclotomic polynomial, to be the product of the monomials of the form {X-\zeta}, where {\zeta} ranges over the primitive {n}-th roots of unity in {\mathbb{C}}.
An element {\zeta\in {\mathbb{C}}} is called a primitive {n}-th root of unity if and only if {\zeta^n = 1}, but {\zeta^r \neq 1} for {r\in \{1,\ldots,n-1\}}.

Remark 1 Note that we may write

\displaystyle \Phi_n( X) = \prod_{(k,n)=1;1 \leq k \leq n} \left( X - \zeta_n^k \right)

where {\zeta_n = \exp \left( 2\pi i / n\right)}. Then this implies that {\deg \Phi_n (X) = \varphi ( n)}.

Proposition 2 Given a positive integer {n\in {\mathbb{Z}^+}} we have

\displaystyle \prod_{d|n} \Phi_d(X) = X^n - 1

Proof: Note that the roots of the LHS all have multiplicity one, and so do the roots on the RHS since {\gcd (X^n-1,(X^n-1)^\prime) = 1}. Now the rest follows from the fact that {\sigma^n = 1} implies that, the minimal {r\in {\mathbb{Z}}^+} such that {\sigma^r = 1} must satisfy {r|n} (use this to show that the sets of roots are the same). \Box

Corollary 3 We have the following formula for {\Phi_n(X)}

\displaystyle \Phi_n(X) = \prod_{d|n} \left(X^d-1\right)^{\mu(n/d)}

where {\mu} is the Moebius function.

Proof: This follows at once from Moebius inversion formula. \Box

Corollary 4 We have {\Phi_n(X) \in {\mathbb Z}[X]}.

Proof: Note that {\mu(n/d)} just takes the values {-1,0,1}. And remember that {(X^d - 1)^{-1} = - 1 - X^d - X^{2d} \ldots} in {\mathbb{Z}[[X]]}. Hence, by the previous corollary, we have that {\Phi_n(X)} is a product of elements of {\mathbb{Z}[[X]]}, and hence will be in {\mathbb{Z}[[X]]}, but it is clear that its degree is finite, hence the result follows. \Box

Remark 2 Note that in particular {\Phi_n(0) = 1} if {n > 1}. Try computing {\Phi_n^\prime (0) \in {\mathbb Z}} and the second leading coefficient of {\Phi_n}.

Here, as a comment, it is nontrivial and interesting that {\Phi_n(X)} is irreducible for each {n\in{\mathbb Z}^+}. This result is easy to prove when {n=p} prime and follows from Eisenstein’s criterion (see the previous post): indeed {\Phi_p(X) = \frac{X^p-1}{X-1}} (by the previous formula), and apply Eisenstein to {\Phi_p(X+1) = \frac{(X+1)^p-1}{X} = X^{p-1} + \binom{p}{p-1} X^{p-2} + \ldots + \binom{p}{2} X + \binom{p}{1}}.

Proposition 5 Given a positive integer {n\in{\mathbb Z}^+}, there are infinitely many primes {p} satisfying {p\equiv 1 (\bmod. n)}.

Proof: Let {p | \Phi_n(a)}. Since we saw that {\Phi_n(0) = 1} for {n>1}, and {\Phi_1 (0) = -1}, it follows that { p \not | a}. Next we see that {p | (a^n - 1)}, since { \Phi_n (a) | (a^n - 1)}. If {{\text{ord}}_p ( a) = n}, we have that {n | (p-1)} by Fermat’s Little Theorem, and so {p \equiv 1 (\bmod. n)}. Assume {{\text{ord}}_p (a) = d < n}, where we must have {d|n}.

But {d|n} implies

\displaystyle 1+X^d + X^{2d}+ \ldots + X^{n-d} = \frac{X^n - 1}{X^d - 1} = \prod_{d^\prime \not | d ; d^\prime | n} \Phi_{d^\prime}(X)

and so, since {a^d \equiv 1 (\bmod. p)} and {d<n} we have

\displaystyle n/d \equiv_p \prod_{d^\prime \not | d ; d^\prime | n} \Phi_{d^\prime}(a) = \Phi_n (a) \cdot \ldots (\bmod. p)

but we had {p | \Phi_n ( a)}, so this means that {p | n}.

Hence, if {p \not | n} we must have {{\text{ord}}_p ( a) = n} and so {p\equiv 1 (\bmod. n)}. The former condition is easy to ensure by choosing say {a = n\cdot k }, because {\Phi_n( 0 ) = \pm 1} then implies that if {p|n} and {p | \Phi_n ( a)}, we must have {p | 1}, which is nonsense.

Thus we conclude that the prime divisors of {\Phi_n (n \cdot k)} are congruent to 1 module {n}, for each {k\in {\mathbb Z}}.

Now we are ready to prove the claim. Suppose there were finitely many primes {p_1,\ldots,p_m \equiv 1 (\bmod. n)}. Then consider {f(k) = \Phi_n \left(n\cdot k \cdot p_1 \ldots p_m\right)}, clearly any prime divisor of {f(k)} can’t be any of the {p_i} because {\Phi_n(0) = \pm 1}. So, since the prime divisors must be {\equiv 1 (\bmod. p)} we are forced to conclude that {f(k) = \pm 1} for each {k\in {\mathbb Z}} and this is nonsense, since it would imply that the polynomial {\Phi_n \left( n \cdot p_1 \ldots p_m \cdot X\right)} is constant. \Box

Posted in Algebra, Number Theory | Leave a comment

Eisenstein’s criterion


In this post we will see a couple of criteria to prove that a given polynomial in {{\mathbb Z}[X]} is irreducible over {\mathbb Q}. As a reminder, a polynomial over {\mathbb Z} is irreducible over {\mathbb Q} if and only if it cannot be written as the product of two non-constant polynomials over {\mathbb Z} (this is known as Gauss’ Lemma).

Theorem 1 (Eisenstein’s criterion) Let {f(X)=c_0+c_1\cdot X^1 + \ldots + c_n \cdot X^n} be a polynomial over {\mathbb Z}, and {p\in \{2,3,\ldots\}} be a prime such that

  • {p | c_0}, {\ldots}, {p|c_{n-1}}
  • {p^2 \not | c_0}
  • {p\not | c_n}

then {f(X)} is irreducible over {\mathbb Q}.

Proof: Suppose otherwise, by Gauss’ lemma {f(X) = g(X)\cdot h(X)} with {g(X),h(X)\in {\mathbb Z}[X]}. Write {g(X) = g_0 + g_1 \cdot X + \ldots + g_k\cdot X^k} and {h(X) = h_0 + h_1 \cdot X^1 + \ldots + h_l \cdot X^l}, so that

\displaystyle c_0 = g_0 \cdot h_0

and here {p | c_0} implies {p | g_0} or {p| h_0} (since {p} is prime), but not both, since {p^2 \not | c_0}. Without loss of generality (renaming otherwise) we may assume that {p | g_0} and {p\not | h_0}. Now

\displaystyle c_1 = g_0 h_1 + h_0 g_1

and {p| c_1} implies {p| (h_0 \cdot g_1)} since {p | g_0}, but we know that {p \not | h_0}, hence {p | g_1}. This argument gets repeated until {p | g_k} (note {k < n}, and so we never hit {c_n}), and so

\displaystyle  f(X) = (g_0 + \ldots + g_k X^k)\cdot h(X)

now implies, since {p | g_i} for {i\in \{0,\ldots,k\}}, that all of the coefficients of {f} are multiples of {p}, which is an absurd since {p\not | c_n}. \Box

Okay, the previous criterion is very well-known and useful, but now, we are going to see a different one. I find it very beatufiul, even more so, because it is not obvious that one can recycle part of the ideas above for another case. The following criterion appears as an exercise (which I solved yesterday 🙂 ) in this book, according to it, the result is due to Eugen Netto and appears in Mathematische Annalen, vol. 48 (1897).

Theorem 2 Let {f(X)=c_0+c_1\cdot X^1 + \ldots + c_{2m+1} \cdot X^{2m+1}} be a polynomial over {\mathbb Z}, and {p\in \{2,3,\ldots\}} be a prime such that

  • {p^2 | c_0}, {\ldots}, {p^2|c_{m}}
  • {p | c_{m+1}}, {\ldots}, {p | c_{2m}}
  • {p^3 \not | c_0}
  • {p\not | c_{2m+1}}

then {f(X)} is irreducible over {\mathbb Q}.

Proof: Again, suppose otherwise. By Gauss’ lemma {f(X) = g(X)\cdot h(X)} with {g(X),h(X)\in {\mathbb Z}[X]}. Write {g(X) = g_0 + g_1 \cdot X + \ldots + g_k\cdot X^k} and {h(X) = h_0 + h_1 \cdot X^1 + \ldots + h_l \cdot X^l}. Since {k + l = 2m+1}, one of {\{k,l\}} is less or equal than {m} and the other one is greater than {m}. Without loss of generality we assume that {\deg g = k \leq m} and that {\deg h = l \geq m + 1}.

Now, notice that {p^2 | c_0} implies {p^2 | (g_0 h_0)}, but, since {p^3 \not | c_0} we have either {p^2 | g_0 \land p \not | h_0}, or {p \not | g_0 \land p^2 | h_0} or {p \| g_0 \land p \| h_0} (here {p \| n} is a shorthand for {p | n} but {p^2 \not | n}). In the first two of these cases, the proof proceeds exactly like in the proof of Eisenstein’s criterion, since we have that {p} divides exactly one of {\{h_0,g_0\}}.

Thus we assume {p \| g_0} and {p \| h_0}. We are going to show that {p | g_i} and {p | h_i} for each {i \in \{0,\ldots m\}}, from which we get that {p} divides every coefficient of {g(X)} (since {\deg g \leq m}), and so we get our desired contradiction like in Eisenstein’s criterion.

We already know that {p | g_0} and {p | h_0}. Assume that { p | g_{j}} and {p | h_{j}} for each {j} with {j\lneq i\leq m}, we are going to show that {p | g_i} and {p | h_i}. First observe that {p^2 | c_i}, since {i \leq m}, and so

\displaystyle p^2 | \left( g_0 \cdot h_i + \ldots + g_i \cdot h_0 \right)

but {p | g_j, p | h_{i-j}} when {j\neq 0,i}, and so {p^2 | (g_j h_{i-j})} for {j\neq 0,i}, thus

\displaystyle p^2 | \left(g_0\cdot h_i + g_i \cdot h_0\right)

Next, note that {p | c_{2\cdot i}} since {2i \leq 2m}, hence

\displaystyle p | \left(g_0 h_{2i} + \ldots + g_i h_i + \ldots g_{2i} h_0\right)

but {p | (g_j h_{2i-j})} when {j \neq i}, since either {j} or {2i-j} is strictly less than {i}, and so {p | g_j} or {p | h_{2i-j}} in this case. Thus

\displaystyle p | (g_i\cdot h_i)

and so either {p | g_i} or {p| h_i}, since {p} is prime. But, in any case, this implies that, since {p | g_0} and {p|h_0}, that either {p^2 | (g_0h_i)} or {p^2 | (g_ih_0)}. But we showed that {p^2 | \left(g_0\cdot h_i + g_i \cdot h_0\right)}, and hence this implies:

\displaystyle  p^2 | (g_0\cdot h_i) \text{ and } p^2 | (g_i \cdot h_0)

and this, along with {p \| g_0}, {p \| h_0} implies that {p | h_i} and {p | g_i} as desired. \Box

Posted in Algebra, Number Theory | Leave a comment

Symmetric random walks


Definition 1 A symmetric random walk is a homogeneous Markov chain {\left(X_0,X_1,\ldots\right)} on the set of states {{\mathbb Z}} such that {X_0=0} with probability {1}, and

\displaystyle \Pr\left(X_{k+1}=n+1 | X_k = n\right) = p_{n,n+1}=\tfrac{1}{2}=p_{n+1,n} = \Pr\left(X_{k+1}=n | X_k = n+1\right)

for all {n\in {\mathbb Z}} .

In plain terms, this means that we start off at {0}, and at each second we decide whether to move a step left or right with equal probability.

Now, given two integers {a,b\geq 0}, and interesting question is: what is the probability {p} of reaching {b} before reaching {-a}?

The answer should be clearly {p=\tfrac{1}{2}} if {a=b}, since, by an argument like the one in this post, we will reach {-a} or {b} with probability one. What if they are unequal?

Proposition 2 Let {a,b} be nonnegative integers such that {a+b>0}. The probability {p} of hitting {b\geq 0} before hitting {-a \leq 0} is

\displaystyle p = \frac{a}{a+b}

Proof: Let {p_k} be the probability of hitting {b} before {a} if we start off at the integer {k}. Clearly {p_b = 1} and {p_{-a} = 0}. Now, for {k\in \{-a+1,\ldots,b-1\}} we have

\displaystyle p_k = \tfrac{1}{2}\cdot p_{k-1} + \tfrac{1}{2}\cdot p_{k+1}

hence, this means {p_{k+1} - p_k = p_k - p_{k-1}}, writing {\Delta p_k = p_{k+1}-p_k}, we have

\displaystyle \Delta p _k = \Delta p_{k-1}

thus this is a constant {C = \Delta p_k} for {k\in \{-a,\ldots,b-1\}} (note that it is valid for {k=-a}) and we have

\displaystyle C\cdot (b+a) = p_{-a} + C\cdot (b+a) = p_{-a} + \Delta p _{-a} + \ldots + \Delta p_{b-1} = p_b = 1

which means that {C = \tfrac{1}{a+b}} and hence

\displaystyle \tfrac{a}{a+b} = p_{-a} + C\cdot a = p_{-a} + \Delta p_{-a} + \ldots + \Delta p_{-1} = p_0 = p

concluding the proof. \Box

Note that the previous proposition allows us to simulate any finite discrete random variable having rational probabilities in a very silly way. For instance, if we had 3 possible events {\{e_1,e_2,e_3\}}, and probabilities {\{\tfrac{1}{3},\tfrac{1}{3},\tfrac{1}{3}\}} for the events respectively, then we may for example first take {a=1,b=2}. The probability of hitting {b} before {a} is {\tfrac{1}{3}} and so, we choose {e_1} if that were to happen. Else, we hit {a} with probability {2/3}, and here we decide between {e_2} and {e_3}, and each of them has equal probability {1/2} given that {e_1} does not occur. We simulate this decision by taking {a=b=1}.

If we had more elements, one would be tempted to say that it is better to break the set of possible events into two that have roughly the same number of elements each, in each step. However, this may not be optimal. I have not looked too much into this matter, but note that we should also take into account the expected time before we hit either barrier ({-a} or {b}), and for this we would like them to be small.
However, {\tfrac{a}{a+b} = \tfrac{k\cdot a}{k\cdot a + k\cdot b} } for {k\neq 0}, and this may complicate matters (why?).

Posted in Computation, Probability | Tagged | Leave a comment