Imagine we had a class with students and each week they hand in a sheet of homework. Now, you choose a set of of them at random and correct only those . At the end, the grade assigned to a student is the average among all of his homework sheets that you corrected. How many weeks does it take for every student to have a grade?
Comment: I came up with this question because I have implemented such a system and I wanted to choose wisely as to have a grade for each student every weeks.
Coupons collector problem.
It is easy to see that this problem is related to the classical coupon collector problem. The problem reads as follows: you have coupons and each time you buy one (picked uniformly at random from the possible ones) and the question is how long it takes you on average to get the full collection. The answer here is
where is the random variable corresponding to the number of coupons it takes you to get the full collection, and .
The proof follows easily upon noticing that, once you have different coupons, you have to wait on average to get a new one.
It is also not difficult to prove that, even though what we have above is the expected value, the actual value of is usually not much bigger than (remember that we may estimate ). Indeed,
because is the probability of never getting (up to time ) a given coupon. The event is the union of the events “not getting a given coupon”, thus we have bounded the probability of the union with the sum .
Thus, if we get
Teacher’s problem. Back to our problem, if represents the number of weeks it takes us to have a grade for every single student, intuitively we should have ()
If we look at the sequence of sets of elements we get, and we write them all in a line in some order, we get a possible outcome for the coupons collector problem… but it is a bit better than in the coupons collector’s case, because we are guaranteed that the first coupons are distinct, the second group of coupons are distinct, and so on. What is better, these cases have higher probability than before, so one expects to have such an inequality (I am not claiming this constitutes a proof!). This establishes, though, the parallel between the two problems, now we expect to have nearly the same results, but with a factor of involved.
We note that the probability of never getting a given student in iterations is
Thus we also get
Considering we get
Proposition 1 We have the bound
This means that we never wait much more than weeks. However, is the expected value as in the case of the coupons collector problem? The answer is negative, but I claim that asymptotically this is not far from the truth.
Proposition 2 We have
Proof: Indeed, we note that bounding the probability of the union by the sum of the probabilities is just the first term of the inclusion-exclusion process, if we continue we get the so-called Bonferroni’s inequalities
whenever (else it is ). So fixing any and , letting
for , which already suggests the claimed result. To actually get it, we must make a better estimate of the remainder term. Note that, in (2) we have
which equals, taking logs and using
Here we may rewrite this as
and the second term is quadratic
Hence if we get
because . Thus, from (1) we get
but the two extremes converge to as , thus the result.
Applying this to my class. In my case I usually have students and I would like to be so I will want to have
in order to apply our inequality and have at most % of probability leaving out some student… thus and (which is, of course, nonsense… because then ). The problem here is that we are making a gross estimation for such a small , thus, in fact, looking more directly at
and setting , we get , i.e. . The actual minimal value of is indeed … by inspection, bad news for me (I was using ). In fact, to have probability roughly 1/2-1/2, we must pick .
This proof is essentially that of  for the Coupon’s Collector problem.
- Erdős, Paul; Rényi, Alfréd (1961), On a classical problem of probability theory (PDF), Magyar Tudományos Akadémia Matematikai Kutató Intézetének Közleményei 6: 215–220.