More on the Monkey typing problem


To compute the expected number of letters the monkey must type before our word turns up, we used \sum_{n\geq 0} {n\cdot A^n} = A \cdot \left( I - A\right)^{-2}  . It is clear that when the series on the left hand side converges, that is the result of the series, in this post we will see a fun little proof of how this series will converge (under some matrix norm) that I came up with just yesterday.

Just as a reminder, remember that A is almost the matrix of a Markov chain, except that the last row is \bold 0 , since once we reach the last (ending) state we are done. Also, in the case of the monkey typing, there is always a path from any given state to the last state which has non-zero probability of being traversed (since there is always a way of completing the word).

Proof

The given series converges if and only if \rho \left( A\right) < 1 exactly like in the case of I + A + A^2 + ... .

So let \lambda \in{ \mathbb C } be an eigenvalue of A , and let v \neq {\bold 0} be an associated eigenvector. Then we have A \cdot v = \lambda \cdot v and iterating we get A^k \cdot v = \lambda ^ k v for all k\in {\mathbb N}. Note that such v must have 0 in the last entry (unless \lambda = 0 in which case we already have | \lambda | < 1 ), because the last row of A consists of 0′s.

Since v \neq {\bold 0} the entry of v = (v_1,...,v_m,0) for which |v_i| is maximum, must be nonzero. So let v_i be that entry.  Now note that, since from each prefix state we have a path to the ending state with non-zero probability, \left [ A^k \right ]_{i,m+1} > 0 for some k \in {\mathbb Z}^+ , consider the smallest such k.

Then A^k \cdot v = \lambda ^ k v implies \lambda ^ k v_i = \sum_{j=1}^{m+1} \left [ A^k \right ]_{i,j} \cdot v_j ,  take modules on both sides and use the triangle inequality: | \lambda |^ k | v_i | \leq \sum_{j=1}^{m+1} \left [ A^k \right ]_{i,j} \cdot |v_j| .

At this point, remember that v_{m+1} = 0  hence | \lambda |^ k | v_i | \leq \sum_{j=1}^{m} \left [ A^k \right ]_{i,j} \cdot |v_j| and since v_i is the entry of maximum module we get | \lambda |^ k | v_i | \leq \sum_{j=1}^{m} \left [ A^k \right ]_{i,j} \cdot |v_j| \leq \left( \sum_{j=1}^{m} \left [ A^k \right ]_{i,j} \right)\cdot |v_i|

But we have \sum_{j=1}^{m+1} \left [ A^k \right ]_{i,j} = 1  (*) , hence \sum_{j=1}^{m} \left [ A^k \right ]_{i,j} < 1 because we know that \left [ A^k \right ]_{i,m+1} > 0.

Thus | \lambda |^ k | v_i | \leq \left( \sum_{j=1}^{m} \left [ A^k \right ]_{i,j} \right)\cdot |v_i| < |v_i| and this implies | \lambda | < 1.

Since \lambda was an arbitrary eigenvalue of A, it follows that \rho \left( A\right) < 1 and the proof is complete. \square

(*) This is 1 because we are considering the smallest k such that \left [ A^k \right ]_{i,m+1} > 0. In k steps we must be somewhere, there is still 0 probability of having disappeared (gone through m + 1) .

Monkey typing!


I find this an amusing idea, you have a monkey typing on a keyboard and what we want to know is roughly how long we will have to wait for a certain word – a name for instance- to be written by the monkey if it types randomly.

The idea is simple, given a word w \in \Sigma^*   where \Sigma is a finite alphabet, we can build a Markov Chain for the problem as follows:

For each prefix of the word w  – including the empty word- we associate a state, upon being given a character we look for the longest prefix in w we can match – considering suffixes of the current stream of characters given by the monkey. As for the probabilities, remember that the probability of receiving a particular character is \frac{1}{|\Sigma|} (each character is equally likely) and so we can build the transition probabilities .

So for instance if we had the alphabet \{ H, T \}   – heads and tails- and the word we are looking for is HH then we have states corresponding to \epsilon – the empty word, H and HH, call these S_1, S_2, S_3 respectively.

Now suppose the monkey types: T, T, T, H, T, H, T, H, H ( I really did this with a coin :D )

Upon receiving the first T we go from S_1 to S_1 -since we have nothing of the word we are looking for- and so on until we get to the first H, then we move from S_1 to S_2. But unfortunately the next character is a T and so we go back to state S_1 since we now have the empty piece of the actual word we are looking for. But then at the end we get H, H and we go from S_1 to S_2 and finally to S_3 and we stop! :)

What we are going to calculate now is the expected number of characters we have to receive from the monkey in order to stop (get our word).

We are going to build a matrix as follows:  Let [A]_{i,j} be the probability of going from state i to state j unless i = |w| + 1 in which case we let [A]_{|w|+1,j} = 0 for all j \in \{1,...,|w|+1\} since once we get the word, we don’t care to look anymore – so this is not actually the transition matrix of the Markov Chain, because the last row was changed for  \bold 0 .

Note that the entries of  A^k   then correspond to the probabilities of going from a state to another in exactly k steps if we have not gone throughS_{|w|+1}   -because we never leave that state!. What we are interested in mainly is the entry 1, |w|+1 which corresponds to the probability of getting the word for the first time after typing exactly k characters!

This means that the expected number of characters that need to be typed before we get the complete word is the 1, |w|+1 entry of \displaystyle\sum_{n\geq 0} { n \cdot A^n} = A\cdot \left( \text{Id} - A \right)^{-2}

As an example application, suppose we had the alphabet \Sigma = \{ a,b, .. , z \}   -  26 letters – and we want to get paul what would the expected time be?

We have the matrix A = \begin{pmatrix}  \frac{25}{26} & \frac{1}{26} & 0 & 0 & 0\\  \frac{24}{26} & \frac{1}{26} & \frac{1}{26} & 0 & 0\\  \frac{24}{26} & \frac{1}{26} & 0 & \frac{1}{26} & 0 \\  \frac{24}{26} & \frac{1}{26} & 0 & 0 & \frac{1}{26}\\  0 & 0 & 0 & 0 & 0  \end{pmatrix}  .

Finally the expected value is roughly \approx 4.57 \times 10^5 .

As for the example with the coin above, in which we were waiting for HH you can work it out by hand, you should get that the expected time to see the word HH for the first time is 6.

To show a more interesting example, consider the word pope. Note that once we have pop:

  • Upon receiving an e we are done.
  • Upon receiving a p we go back to the state in which we only have the prefix p
  • Upon receiving an o we go back to the state in which we have the prefix po
  • Upon receiving any other character, we go back to the state in which we have no character \epsilon.

A cute trick


This post is about a recurrent trick in Number Theory problems. I will present it with the following:

Problem: Find all n \in \mathbb{Z}^+ such that n | (2^n - 1)

Solution

It is trivial to see that it works for n = 1 and that n should always be odd.

Now suppose n > 1 then it is divisible by at least one prime, let p be the smallest prime dividing n.

We should have p|(2^n - 1) so that \text{ord}_p(2) | n . Why did we choose the smallest prime?

Well, it turns out that \text{ord}_p ( 2) also divides p- 1 but since p was the smallest prime dividing n we must have (p-1,n) = 1  aha! :D

Then it follows immediately \text{ord}_p(2) = 1 so that 2 \equiv 1 (\bmod. p) which is absurd!

Thus the only solution is n = 1.

Here you have a couple of problems that can be solved with this trick:

  • The second problem here
  • And this post  here

 

Independent Sets


This is going to be a short post, related to the last post of October ;)   (I seem to be really enjoying this mixing of inclusion-exclusion and generating functions :D )

Definition: Given a simple graph G(V,E) we say S\subseteq V is an independent set if \{u,v\} \not \in E for all u,v \in S , that is, no 2 vertices of S are joined by an edge.

Now note that in a way this is similar to graph colouring, we may for instance assign 2 colours – 1 or 0- to each node, but instead of requiring that each pair of adjacent vertices have distinct colours, we will require that no 2 adjacent vertices have colour 1. Here note that 1 means that the vertex belongs to our independent set, and 0 that it doesn’t belong to it.

So here is the generating function:

Result: Given a simple graph G(V,E) let a_k be the number of independet sets with exactly k vertices. Then we have \displaystyle\sum_{k\geq 0} { a_k \cdot x^k} = \sum_{S\subseteq E} {(-1)^{|S|}\cdot (1+x)^{r(V,S)}\cdot x^{|V|-r(V,S)} } where r(V,S) is the number of vertices in the graph G'(V,S) that have no incident edges.

Proof

Again, each edge may fail or not in the sense that both of its endpoints have the colour 1.

Now we apply the inclusion-exclusion principle, given a set of edges S that are to fail, each of the vertices with some edge incident to them must be coloured 1 and be counted as part of the set of vertices we are working with (thus the x^{|V|-r(V,S)} ) and then the rest of the vertices may or may not belong to the set of vertices we are working with ( 1 \to \text{not in it}  , x\to \text{in it} ).

If it is not clear yet why inclusion-exclusion will work with polynomials, expand each term and note that we are doing an inclusion-exclusion for each number of nodes possible \square

Note however that this result, in practice, is not really useful because its running time would be roughly \Theta\left(|E|\cdot 2^{|E|}\right) when we could for instance go over every set of vertices and check in about \Theta\left(|E|\cdot 2^{|V|}\right) .

Exercise: Consider a simple graph G(V,E) and a set of colours \left\{ 1,...,n \right\}  can you find a formula for the generating function f(x_1,...,x_n)   where the coefficient of x_1^{c_1}...x_n^{c_n} is the number of colourings with c_i vertices coloured i for i \in \{1,...,n\} ?

Trees!


We begin this post by refreshing the definition of the term Tree. A tree is a simple (labeled) graph T(V,E) such that there’s exactly one path from each node to every other node.

Some elementary results about them are the following:

Result 1: A simple graph G(V,E)   is a tree if and only if it has no cycles.

Result 2: Every finite tree T(V,E) with |V|\geq 2 has at least 2 vertices v and w such that \text{deg}(v) =  \text{deg}(w)=1 ( a vertex with this degree is called a leaf)

Proof of Result 2

Let x_1,...,x_r be the longest simple path in T, we will show that both x_1 and x_r are leaves. (since we have that x_1 \neq x_r )

Suppose one of them were not, then it’d have a neighbour other than x_2 in the case of x_1 and x_{r-1} in the case of x_r. This neighbour can’t be a vertex that is not in the path because this would contradict the maximality of the path -by adding this vertex to the path we would get a longer one-, but if this neighbour were in the path this would generate a cycle! \square

Result 3: For any finite tree  T(V,E) we have |E| = |V| - 1

Proof of Result3

Apply induction and Result2. \square

The aim of this post is to see some “quantitative” properties about them.

First we have the following :

Result 4: The number of labeled trees with n vertices labeled 1 through n, such that \text{deg}(v_i) = d_i for i = 1,...,n is: f_n\left(d_1,...,d_n\right)=\displaystyle\frac{(n-2)!}{(d_1-1)!\cdot (d_2-1)!\cdot ... \cdot (d_n-1)!}

( we should have d_i \geq 1 and \sum d_i = 2n-2, otherwise  f_n is 0 )

We will induct on the number of vertices, for 2 vertices it is clear that the result holds.

Now assume the result holds for all trees with n vertices, what happens now for n+1 ?

First among the d_i‘s there must be some d_j such that d_j = 1, otherwise there’s no possible tree -since there must be at least 2 leaves.

Without loss of generality assume d_{n+1} = 1 , node n+1 must have exactly one neighbour, if the  label of the neihbour were n ( if it weren’t just rename it) then there are exactly \displaystyle\frac{(n-2)!}{(d_1-1)!\cdot (d_2-1)!\cdot ... \cdot (d_n-2)!} trees in which n and n+1 are neighbours and n+1 is a leaf.

So summing over all the possible neighbours for n+1 we have \displaystyle\sum_{k=1}^n{\displaystyle\frac{(n-2)!}{(d_k-2)!\cdot \prod_{j\neq k}{(d_j-1)!}}} which can be written as \displaystyle\sum_{k=1}^n{\displaystyle\frac{(n-2)!}{\prod_{j = 1}^n{(d_j-1)!}}\cdot (d_k-1)} = \displaystyle\frac{(n-2)!}{\prod_{j = 1}^{n+1}{(d_j-1)!}} \cdot \displaystyle\sum_{k=1}^n{(d_k-1)}

( note the (d_{n+1} - 1)! = 1 in the product )

But \displaystyle\sum_{k=1}^n{(d_k-1)} = \displaystyle\sum_{k=1}^{n+1}{(d_k-1)} = n-1 by result 3 and the fact that in any graph \displaystyle\sum_{v\in V}{\text{deg}(v)} = 2\cdot |E| ! and we are done \square

Now a corollary is the following:

Corollary 1: We have the generating function x_1\cdot ...\cdot x_n \cdot \left(x_1+...+x_n\right)^{n-2} = \sum f_n\left(d_1,..,d_n\right)\cdot x_1^{d_1}\cdot ...\cdot x_n^{d_n}

Proof outline: mix Result4 with the multinomial theorem.

Corollary 2: The number of labeled trees on n vertices is n^{n-2} ( this is known as Cayley’s formula )

The last result I want to prove in this post is about the average number of leaves of a labeled tree, but in order to do this we’ll have to revisit the inclusion-exclusion principle.

Here I will follow the approach given in generatingfunctionology by Herbert Wilf.

Suppose we were working with a collection of objects and we  had a set of properties P = \left\{P_1,...,P_n \right\} that they may satisfy, we define r_k = \displaystyle\sum_{|S|=k}{\left\| R \supseteq S \right\|} where \left\| R \supseteq S \right\| is the number of object that satisfy at least all of the properties given in S \subseteq P .

Define e_k to be the number of  objects of our collection that satisfy exactly k properties then we have: r_k = \displaystyle\sum_{t\geq 0} \binom{t}{k}\cdot e_t because for each object that satisfies exactly t properties, it satisfies all of the properties of exactly \binom{t}{k} subsets of P of cardinal k

So we have \displaystyle\sum_{k\geq 0} r_k \cdot x^k = \displaystyle\sum_{k\geq 0} \displaystyle\sum_{t\geq 0} \binom{t}{k}\cdot e_t\cdot x^k thus \displaystyle\sum_{k\geq 0} r_k \cdot x^k = \displaystyle\sum_{t\geq 0} \left( \displaystyle\sum_{k\geq 0} \binom{t}{k}\cdot x^k \right)\cdot e_t= \displaystyle\sum_{t\geq 0} e_t \cdot (x+1)^t

 

Result 5: Thus we find: E(x) = \displaystyle\sum_{t\geq 0} e_t  \cdot x^t = \displaystyle\sum_{k\geq 0} r_k \cdot (x-1)^k

So here come the leaves:

Result 6: The average number of leaves in the labeled trees with n vertices is \sim \frac{n}{e} as n\to +\infty

So to apply inclusion-exclusion consider the collection of labeled trees with n vertices  and the properties P_i being that node i is a leaf.

For r_k we have first to choose a set of k nodes (for the leaves) and then complete the tree.

The number of trees such that x_1,...,x_k are leaves is the coefficient of the term x_1...x_k (since we want all of this nodes to be of deg 1) in  x_1\cdot ...\cdot x_n\cdot\left(x_1+...+x_n\right)^{n-2} evaluated at x_1=...=x_n=1 – by Corollary 1.

Now this is: x_{k+1}\cdot ... \cdot x_n\cdot (x_{k+1}+...+x_n)^{n-2} evaluated at x_{k+1} = ...=x_n = 1 ( can you see why ;) )

Thus: r_k = \binom{n}{k} \cdot (n-k)^{n-2}

By result5 what we are looking for is \displaystyle\frac{E'(1)}{E(1)} = \displaystyle\frac{\sum_{t\geq 0} e_t  \cdot t}{\sum_{t\geq 0} e_t } = \displaystyle\frac{r_1}{r_0}

Now this is n\cdot \displaystyle\frac{ (n-1)^{n-2} }{ n^{n-2}} = n\cdot \left(1 - \frac{1}{n}\right)^{n-2} and indeed \left(1 - \frac{1}{n}\right)^{n-2} \to e^{-1}   \square

Graph Colouring


This post will be dedicated to graph-colouring, inspired by a recent problem from TopCoder ( SRM 484 DIV2 problem 3).

A proper colouring of a labeled graph G(V,E) is a function f:V\to C  , where C is the set of colours, such that if  \left\{u,v\right\}\in E then f(u) \neq f(v) , that is, adjacent vertices have different colours.

The chromatic number of a graph, is the least possible number of elements C must have so that a proper colouring exists.

Proposition: Given a labeled graph G(V,E) with chromatic number m ,

we have m \leq \tfrac {1}{2} + \sqrt{2\cdot |E| + \tfrac{1}{4}}

Proof :

Consider a proper colouring with m colours, then there must be an edge joining a node with one colour to a node of each of the other colours – otherwise we could do with one less colour, which would contradict the minimality of m-, thus \tfrac{m\cdot (m-1)}{2} \leq |E| and the rest follows from there. \square

We also define the chromatic-polynomial p_{G}(\lambda) to be the function defined on the natural numbers such that given a natural number it returns the number of proper colourings when we consider C = \left\{1,...,\lambda\right\}. We shall see it’s indeed a polynomial.

Proposition: Given a labeled graph G(V, E) we have p_G(\lambda) = \displaystyle\sum_{S\subseteq E }{(-1)^{|S|}\cdot \lambda^{\text{Comp}(V, S)}} where \text{Comp}(V,S) is the number of connected components in the graph G'(V, S)

Proof

Fix a number of colours \lambda   .

For each e = \left\{u,v\right\}\in E consider S_e = \left\{ \text{colourings } f \text{ of } G \text{ such that } f(u)=f(v) \right\}

Note then that we want the cardinal of the complement of : \bigcup_{e\in E }S_e

By inclusion-exclusion this turns out to be : |U| - \sum_{e\in E} {| S_e |} + \sum_{e, e'\in E , e \neq e'} {| S_e \cap S_{e'} |} ... where U is the set of all possible colourings.

Now note that a colouring belongs to S_{e_1}\cap S_{e_2}\cap ... \cap S_{e_k} if and only if each connected component in G'\left(V, \left\{e_1,e_2,...,e_k\right\} \right) has a unique colour. Why? Well, if 2 vertices are joined by an edge in G’ then they must be assigned to the same colour.

Thus | S_{e_1} \cap ... \cap S_{e_k} | = \lambda^{\text{Comp}\left(V, \left\{e_1,...,e_k\right\}\right)} and the assertion follows. \square

So indeed p_G(\lambda) is a polynomial!

That is the idea of my solution to the TopCoder problem mentioned above. It runs pretty fast if the graph few edges – sparse graph-.

However, if the number of edges is large we will need a different approach.

 

A trick for dense graphs: If G = K_n the complete graph on n vertices, then evidently p_G(\lambda) = \lambda \cdot (\lambda - 1) \cdot ...\cdot (\lambda - n + 1) since each vertex must have a different colour.

If G is not a complete graph, then there’s a pair of vertices (u,v) such that \left\{u, v\right\} \not\in E, these 2 vertices may or may not have the same colour. If the have the same colour, it’s as if we merged the 2 vertices into 1 that has as neighbours all the neighbours of both u and v, if they had different colour the colouring corresponds to a colouring of the graph in which u and v are linked by an edge. We proceed like this until the graph is complete.

Note that this algorithm for computing the chromatic polynomial is quite fast if the graph is very dense (that is, there are lots of edges), since we are close to a complete graph.

However, if the the number of edges is neither too big nor too small and the graph has lots of vertices, we are still in trouble. As it tunrs out, if this problem had a nice solution – in polynomial time- then so would the graph colouring problem which is NP-complete, so there’s little hope of finding one such algorithm.

 

An Algorithm for Bipartite Graphs: This algorithm is a generalisation of the official solution to the TopCoder problem, it was suggested to me by Cem – wingless in MHF -.

Suppose we can partition the vertex set V in 2 sets V_1 and V_2 such that V_1 \cap V_2 = \o and V_1 \cup V_2 = V and for each \left\{u, v\right\}\in E either u \in V_1 and v \in V_2 or viceversa.

Then note that the vertices in V_1 do not influence one another, so that we can first fix the colours for each of the vertices of V_1 , now each vertex in V_2 is independent, so once the colours for V_1 are fixed, we can fix choose the colours for each of the vertices of V_2 separatedly.

So we first fix the colours for the vertices in V_1 and then multiply the ways in which we can colour each of the vertices of V_2 -once the colours for V_1 are fixed. That is p_G(\lambda) = \displaystyle\sum_{\text{colouring of }V_1} \prod_{v\in V_2}{(\text{Number of colours } v \text{ may have})}

To do this more intelligently, choose V_1 to have cardinal less or equal to that of V_2 so that |V_1| \leq \tfrac{|V|}{2}.

Thus if we wanted to compute p_{G}(\lambda) for a fixed \lambda,  the running time would be O( \lambda\cdot N \cdot \lambda^{N/2} )  where |V| = N

Note that this algorithm is generalizable for tripartite graphs and so on… it all depends on the chromatic number of the graph, but the greater the latter is, the slower it will be.

Factorials


Factorials and the infinitude of prime numbers

Polignac’s Formula: Given a prime p let v_p(n) be the greatest integer k such that p^k|n!.We have v_p(n)=\displaystyle\sum_{k=1}^{\infty}{\left \lfloor \displaystyle\frac{n}{p^k} \right \rfloor}

Outline

In \left\{1,...,n\right\} there are exactly \left \lfloor \displaystyle\frac{n}{p} \right \rfloor multiples of p, of these \left \lfloor \displaystyle\frac{n}{p^2} \right \rfloor are also multiples of p^2 and so on…

Theorem: There are infinitely many prime numbers.

Proof

By the fundamental theorem of arithmetic n! = \displaystyle\prod_{p\leq n}{p^{v_p(n)} } .

Note that v_p(n)=\displaystyle\sum_{k=1}^{\infty}{\left \lfloor \displaystyle\frac{n}{p^k} \right \rfloor} \leq \displaystyle\sum_{k=1}^{\infty}{\displaystyle\frac{n}{p^k}} = \displaystyle\frac{n}{p-1}. Thus: \sqrt[n]{n!} \leq \displaystyle\prod_{p\leq n}{p^{\tfrac{1}{p-1}}}    (*)

If there were finitely many primes, then the right hand side of (*) should be convergent, however e^n = 1 + \frac{n^1}{1!} + ... + \frac{n^n}{n!}+...>\frac{n^n}{n!}  thus \sqrt[n]{n!} > \displaystyle\frac{n}{e} and so the product on the RHS tends to infinity as n \to +\infty

Estimating n!

A not very precise estimate : It’s easy to see that we have \log(n!) = \sum_{k=1}^n\log(k)\approx \displaystyle\int_0^n\log(x)dx \approx n\cdot \log( n) but this is a rough approximation though, and taking the exponentials on both sides would lead to a very large error which is not good enough.

Stirling’s Approximation :

Instead note that  \log( n) = \displaystyle\sum_{k=1}^{n-1}{\log\left(1+\displaystyle\frac{1}{k}\right)} so we have \log( n!) = \displaystyle\sum_{r=1}^{n}\displaystyle\sum_{k=1}^{r-1}{\log\left(1+\displaystyle\frac{1}{k}\right)}

Thus \log( n!) =\displaystyle\sum_{k=1}^{n-1}{\left(n-k\right)\cdot \log\left(1+\displaystyle\frac{1}{k}\right)} = n\cdot \log(n) - \displaystyle\sum_{k=1}^{n-1}{k\cdot \log\left(1+\displaystyle\frac{1}{k}\right)} Aha! now this looks more interesting.

We now write : \log(1+x) = \frac{x^1}{1}-\frac{x^2}{2}...+(-1)^{m+1}\cdot \frac{x^m}{m} + (-1)^m\cdot \int_0^x{\frac{t^m}{1+t}dt}

Since I don’t want to spoil the simplicity of the proof by writing how the error will vanish -which is simple enough, but long and dull-,  here we’ll omit it.

Hence \log( n!) =n\cdot \log(n) - \displaystyle\sum_{k=1}^{n-1}{k\cdot \left(\frac{1}{k}-\frac{1}{2k^2}...\right)}  then \log( n!) =n\cdot \log(n) - (n-1) + \tfrac{1}{2}\cdot H_{n-1} ... +\tfrac{ (-1)^{m+1}}{m}\cdot \sum_{k=1}^{n-1}{\frac{1}{k^{m-1}}} +...

As n \to +\infty we have  :  H_{n-1} = \log( n) + \gamma + o(1)  and so:

\log( n!) =\left(n+\tfrac{1}{2}\right)\cdot \log(n) - n + 1+\tfrac{\gamma}{2} - \tfrac{\zeta(2)}{3} + \tfrac{\zeta(3)}{4} ...+o(1)

Now what is left is proving :  1+\tfrac{\gamma}{2} - \tfrac{\zeta(2)}{3} + \tfrac{\zeta(3)}{4} ... = \log(\sqrt{2\pi})  But at least one can easily see the series on the LHS converges. To prove this, we will use some properties of the gamma function.

We have : \log\Gamma(1+x) = -\gamma\cdot x + \zeta(2)\cdot\displaystyle\frac{x^2}{2}-\zeta(3)\cdot\displaystyle\frac{x^3}{3}\pm... for -1 < x \leq 1

In particular it’s important to note that \gamma = \tfrac{\zeta(2)}{2}-\tfrac{\zeta(3)}{3}\pm...  (*)

And so \displaystyle\int_0^1 \log\Gamma(1+x)dx = -\tfrac{\gamma}{2} + \tfrac{\zeta(2)}{2\cdot 3} - \tfrac{\zeta(3)}{3\cdot 4} \pm...= -\tfrac{\gamma}{2} + \zeta(2)\cdot \left(\tfrac{1}{2} - \tfrac{1}{3} \right)-\zeta(3)\cdot \left(\tfrac{1}{3} - \tfrac{1}{4} \right)\pm...

By (*) we then have: \displaystyle\int_0^1 \log\Gamma(1+x)dx = \tfrac{\gamma}{2} - \tfrac{\zeta(2)}{3} \pm...

Now: \Gamma(1+x) = x\cdot \Gamma(x) hence \displaystyle\int_0^1 \log\Gamma(1+x)dx =\displaystyle\int_0^1\log( x)dx + \displaystyle\int_0^1\log\Gamma(x)dx = -1 + \displaystyle\int_0^1\log\Gamma(x)dx

A little trick : I = \displaystyle\int_0^1\log\Gamma(x)dx = \displaystyle\int_0^1\log\Gamma(1-x)dx thus 2\cdot I = \displaystyle\int_0^1\log\Gamma(x)dx + \displaystyle\int_0^1\log\Gamma(1-x)dx = \displaystyle\int_0^1\log\left(\Gamma(x)\cdot \Gamma(1-x)\right)dx

By the reflection formula : \Gamma(x)\cdot \Gamma(1-x) = \displaystyle\frac{\pi}{\sin(\pi\cdot x)}  for  x \in (0, 1)

Thus: 2\cdot I =\log(\pi) - \displaystyle\int_0^1\log\sin(\pi\cdot x)dx but \displaystyle\int_0^1\log\sin(\pi\cdot x)dx = \tfrac{1}{\pi}\cdot \displaystyle\int_0^{\pi}\log\sin(x)dx = \tfrac{2}{\pi}\cdot \displaystyle\int_0^{\tfrac{\pi}{2}}\log\sin(x)dx = -\log( 2) which is a well-known integral. So I = \log\sqrt{2\pi}

Hence we have \tfrac{\gamma}{2} - \tfrac{\zeta(2)}{3} \pm... = \log\sqrt{2\pi} - 1 and finally:

\log( n!) =\left(n+\tfrac{1}{2}\right)\cdot \log(n) - n +\log\sqrt{2\pi}+o(1)

n! = \displaystyle\frac{\sqrt{2\pi n}\cdot n^n}{e^n}\cdot e^{o(1)} as n\to +\infty

This approximation is known as Stirling’s formula

Generating Functions


In this post we shall talk about Generating Functions, firstly about the Fundamental Theorem of Exponential Generating Functions -giving a different approach than that given in Generatingfunctionology -, and then we will talk about some sample problems that can be solved by the use of Generating Functions.

Given a sequence: it’s corresponding egf (exponential generating function) is:

Exponential generating functions have some interesting properties:

1. If and are the e.g.f. of and respectively, then is the e.g.f. of

2. If is the e.g.f. of then is the e.g.f. of , where

The proofs are left to the reader.

The second property clearly relates differential equations with difference equations, for instance, if a sequence satisfies for all natural numbers n, then its e.g.f. , , satisfies :

As a prelude to the Fundamental Theorem of E.G.F.’s, we will find a recurrence relation to the number of labeled connected graphs of n vertices (the vertices are labeled 1 through n ):

Property: If is the number of connected labeled graphs on n vertices, then

Proof

Imagine we had n vertices labeled 1 through n+1, how many graphs can we create then ?

Clearly the answer is , since for every pair of vertices, the edge may or may not exist.

Now let’s count it in a different way. Consider the vertex labeled n+1, this vertex must belong to some connected component. How many graphs are there such that the connected component to which n+1 belongs has k+1 vertices? – Well, the answer goes as follows, first we have to choose k numbers from {1..n} to be the labels of the rest of the vertices of the connected component, that can be done in: ways. Now, given our k+1 vertices, we have possible connected graphs :o and further, there are possible labeled graphs for the vertices outside the connected component of n+1. Hence we have graphs in which the component in which the vertex labeled n+1 contains k+1 vertices! now sum over all the possible values of k and we are done!

Now, let’s dig deeper and generalise the previous proof. Suppose we have a set of connected structures that have an associated label set, let be the number of labeled connected structures with label set {1,…,n} ( say they have to be of size n ) and define to be the number of possible combinations of the previous structures such that their label sets for a partition of {1,…,n} , then we have:

Theorem:

The proof is left to the reader since it turns out to be analogous to that of the connected labeled graphs.

Corollary: Let and be the exponential generating functions of and respectively, then

Proof

The theorem above translates into: by the properties about the generating functions we saw above. Solving this differential equation yields the result

Now let’s see an application of this result:

Application: Every permutation can be written uniquely as a composition of disjoint cycles .

Proof

First, it’s clear that if a permutation can be written as a composition of disjoint cycles, this must be unique up to the order of the compositions. So it remains to prove that every permutation can be seen as a composition of disjoint cycles.

The number labeled cyclic permutations ( with label set {1, …, n} ) turns out to be (for n>0). This corresponds to in the theorem above. Hence

Thus: that is

Now what is ? Yes!, it’s the number of permutations of {1, …, n} that can be written as a composition of disjoint cycles.

Let’s generalise even further what we have done so far, let be the number of combinations of k connected structures such that there label sets form a partition of {1,…,n}. Then we have:

Fundamental Theorem of E.G.F.’s :

Proof

Note that: ( the coefficient of in ).

We shall see that: , the rest of the proof after that is fairly simple.

Think it this way, suppose we had k boxes, first we distribute our n labels -or numbered balls- into those k boxes in such a way that we have balls in the first box and so on… This can be done in ways

Now in box number i we have possible connected structures to build. Hence, after distributing the balls, we can build the connected structures in the boxes in ways.

Hence we have ways of doing that with the balls, and considering all the possible distributions of the balls we have: ways.

But note that the difference between this and is that now we do care about the order of the k structures!

Practice: Define to be the number of permutations of {1,…,n} made out of exactly k dijsoint cycles , then show that and conclude:

Up to this point, all you’ve read in this entry was written in 2009.  One year later – almost- I am resuming writing it! :)

Problem (IMO 2008 ~ problem 5) :

Let  and  be positive integers with and  an even number. Let  lamps labelled  be given, each of which can be either on or off. Initially all the lamps are off.

We consider sequences of steps: at each step one of the lamps is switched (from on to off or from off to on). Let  be the number of such sequences consisting of k steps and resulting in the state where lamps 1 through are all on, and lamps  through are all off.

Let  be the number of such sequences consisting of  steps, resulting in the state where lamps 1 through are all on, and lamps through are all off, but where none of the lamps through is ever switched on.
Determine the ratio  .

Solution

First observe that a lamp will end up on if and only if it is switched an odd number of times. So the lamps that we want to be off will be switched an even number of times while the others, an odd number of times.

Now, in our sequence we care about the order but not about the relative order of 2 switches of the same lamp -because they are indistinguishible, this immediately implies that we will use exponential generating functions.

Let be the e.g.f of the number of sequences in which we may switch the lamps from n+1 to 2n. (labeled by number of switches)

Then:  here each factor correspons to a lamp, note that in the first n lamps we only consider odd powers (hence sinh( x) ) while in the rest we consider even powers (hence cosh(x )).

Thus:

Now consider  to be the e.g.f. for the case in which we never switch on a lamp from n+1 through 2n, then clearly  .

Hence and so thus we have: when is even – because we necessarily have greater than 0.

A formula for the primes


First we define the Möbius function:

Lemma 1: Let n be a positive integer, then

Let be the primes dividing n, then, since we only care about the square free divisors, these divisors are all the possible products that one can do using these primes (including when you choose none of them, in which case you get 1), then: and the rest follows by the Binomial Theorem

Lemma 2: We have: for

Note that our sum is: and that then goes to the coefficient of if and only if and that is: and the rest follows

Corollary 1: for ( mix Lemma 1 and Lemma 2! )

Gandhi’s Formula: Let be the nth prime and then:

Proof:

Taking: in Corollary 1 we have: thus:.

Set: . Then: . However, note that the smallest positive integer (>1) that is coprime to is .

Now: since

From here it follows that: and then

Fermat’s Little Theorem


Fermat’s Little Theorem : Let be a prime and an integer, then

Proof

First, note that if we prove it for the positive integers we are done, for the other case follows immediately.

So define ; and  .

Consider the operation in T defined by note that if and that if

Now we define the following equivalence relation in :

Note that the equivalence class of an element x in is: each of these elements being distinct . Thus each equivalence class has p elements and so, since these sets form a partition it follows that . But and the proof is complete

This procedure generalises (using the ideas here) to :

Follow

Get every new post delivered to your Inbox.