Season 4 Episode 2
What's the probability that two people from a group of $n$ people have the same birthday? In this episode, we find out!
Further Reading
Birthday Paradox
On the livestream, we tried an experiment to see if we could find a pair of people with the same birthday. There were about 35 people in the live chat, which I think gave us about an 80% chance of having a shared birthday. And then that did happen; we found two people with the same birthday.
What I didn’t see coming though, was that we had three people claim to have the same birthday. I’ve run the numbers over the weekend, and I think that there was only about a 5% chance for that to happen, so maybe I was right to be slightly sceptical during the live show.
We talked about the fact that if you have 23 people in a room, there’s a 50% chance that two people have the same birthday. Because someone asked for it, here’s a fact with 99% in it! If there are 57 people in a room, then there’s a 99% probability that two have the same birthday. \to get to that fact, I calculated some values of the formula from the livestream.
Curve Sketching
On the livestream we came up with the “homework” problem to sketch $y=e^{-x(x-1)}$. I should probably have recognised this curve-sketching problem when I set it, because it’s quite a normal graph. If you found that too easy, try sketching $y=e^{-x(x-1)(x-2)}$. Check your answer with desmos.
For more curve sketching challenges, check the further reading for OOMC season 2 – we had a regular feature for a while with a curve to sketch each week at the end of the further reading.
Hash functions
Two weeks ago the further reading talked about having a binary check digit for error detection. This week we mentioned hash functions on the livestream, and it’s a related idea; calculate something based on the data, and then later someone else can come and check whether the data’s been altered by checking that the calculation still gives the same answer.
The birthday attack that I mentioned is to do with the way that it’s surprisingly easy to find a pair within random data. So if you’re calculating these hash functions for lots of different data sets then it’s not too hard to find a pair that give the same output. This is a bit of a problem because then I can switch between the pair, and that switching is undetectable.
This blog post on hash functions gives an introduction to the ideas.
Different distributions
Something we ignored for the calculation in the livestream is that some days of the year are more likely to be a birthday than others. I said in passing that this probably makes it more likely for there to be clashes. To build intuition for why this might be, we could think about a simpler problem where there are just two days in the year, and just two people in the group. If birthdays are on each day with equal probability, then the chance of a matching pair is $\frac{1}{2}$. Let’s think about what happens if birthdays are more likely to be on the first day of the year than on the second day of the year, with probability $p$ for a birthday to be the first day of the year. Then the probability that there’s a matching pair is $p^2+(1-p)^2$ (either both birthdays are the first day of the year, or both birthdays are the second day of the year). I claim that this expression is always bigger than or equal to $\frac{1}{2}$.
Exercise: Prove that $p^2+(1-p)^2\geq \frac{1}{2}$ for $0\leq p \leq 1$.
Exercise: Investigate what happens if there are three days in the year instead of two, but still just two people in the group. What’s the chance that those two people share a birthday?
Exercise: Investigate what happens if there are three people in the group.
If you want to get in touch with us about any of the mathematics in the video or the further reading, feel free to email us on oomc [at] maths.ox.ac.uk.