Imagine that you come back home from San Francisco, just arrived from the RSA Conference. You are unpacking your suitcase, open the drawer where you store your underwear and… what do you discover? A piece of underwear that is not yours! Naturally, you ask yourself: How probable is that my partner is cheating on me? Bayes’ theorem to the rescue!
The concept behind the Bayes’ theorem is surprisingly straightforward:
By updating our previous beliefs with objective new information, we get a new improved belief
We could formulate this almost philosophical concept with simple math as follows:
New improved belief = Previous beliefs x New objective data
Bayesian inference reminds you that new evidences force you to check out your previous beliefs. Mathematicians quickly coined a term for each element of this method of reasoning:
- “Prior”is the probability of one’s previous beliefs.
- “Likelihood” is the probability of a new hypothesis based on new objective data.
- “Posterior” is the probability of a new examined belief.
Naturally, if you apply the inference several times in a row, the new prior probability will get the value of the old posterior probability. Let’s see how Bayesian inference works through a simple example from the book Investing: The Last Liberal Art.
Bayesian inference in action
We have just ended several dice board games. While we are storing all pieces in their box, I roll a dice and cover it with my hand. “What is the likelihood of having a 6?” ꟷI ask you. “Easy”, ꟷyou answerꟷ “1/6”.
I check carefully the number under my hand and say: “It is an even number. What is the likelihood of still having a 6?”. Then, you will update your previous hypothesis thanks to the new information provided, so you will answer that the new probability is 1/3. It has increased.
But I give you more information: “It is not 4”. What is the probability now that there is a 6? Once again, you need to update your last hypothesis with the new information provided, and you will reach the conclusion that the new probability is 1/2, it has increased again. Congrats!! You have just carried out a Bayesian inference analysis!! Each new objective data has forced you to check out your original probability.
Let’s analyze, armed with this formula, your partner’s supposed unfaithfulness.
How to apply Bayesian inference to discover if your partner is unfaithful to you
Turning to the initial question: Is my partner cheating on me? The evidence is that you have found strange underwear in your drawer (RI); the hypothesis is that you are interested to evaluate the probability that your partner is unfaithful to you (E). Bayes’ theorem may clarify this issue, provided that you know (or are ready to estimate) three quantities:
- What is the probability that, if your partner is cheating on you, stranger underwear may be found in your drawer, Pr(RI|E)? If your partner is indeed cheating on you, imaging how such underwear arrived at your drawer is quite easy. However, even (or maybe particularly) if your partner is unfaithful to you, you can expect your partner to be more careful. We can state that the probability that such piece of underwear appears in your drawer if your partner is being unfaithful is 50%, this is Pr(RI|E) = 0,50.
- What is the probability that, if your partner is faithful to you, stranger underwear may be found in your drawer, Pr(RI|¬E)? Maybe your partner buys in secret opposite-sex underwear and uses it when you are out ꟷstranger things have happened. Maybe a platonic friend of your partner, who you fully trust, has spent one night at home. Maybe it was a present for you and your partner forgot to wrap it. No one of these theories are intrinsically insupportable, although they make us remember that old pretext: “the dog ate my homework”. Jointly, a 5% probability may be attributed to them, this is Pr(RI|¬E) = 0,05.
- Finally, and most importantly, you need the prior probability. How much did you believe in your partner’s unfaithfulness before finding that stranger underwear in your drawer, Pr(E)? Of course, it will be difficult for you to be fully objective now that you have discovered that mysterious clothing. In an ideal scenario, the prior probability will be settled before starting to examine the evidence. Fortunately, sometimes it is possible to empirically estimate this data. Specifically, according to statistics, roughly 4% of married couples are unfaithful to their partners in a given year. This is the base rate, so you set it as your prior probability: Pr(E) = 0,04. Obviously, the probability that your partner has not been cheating on you is Pr(¬E) = 1 − Pr(E) = 0,96.
Assuming a good work when estimating these values, now you only need to apply the Bayes’ theorem to set the posterior probability. In order to make these calculations easier, let’s assume a group of 1,000 couples ꟷrepresented through the big green rectangle of the following image. It is easy to see that, if 40 out of 1,000 people cheat on their partner, and half of them forget their lover’s underwear in their partner’s drawer, 20 people have forgotten underwear (group 4). Furthermore, out of 960/1,000 people that don’t cheat on their partners, 5% have let underwear in their partner’s drawer by mistake, this is, 48 people (group 2). By adding both figures we have as a result 68 mysterious pieces of underwear that will have appeared spread over couples’ drawers (group 2 + group 4).
Consequently, if you find a strange piece of underwear in your drawer, what is the probability that your partner is cheating on you? It will be the number of pieces of underwear found when couples are unfaithful to their partners (4) divided into the total number of pieces of underwear found, belonging to both unfaithful and faithful partners (2 + 4). There is no need to calculate anything, it jumps out at you that a strange piece of underwear is more likely due to a faithful than to an unfaithful partner. In fact, the exact value of the posterior probability is: Pr(E|RI) = 20/68 ≈ 29%.
We can mathematically collect the quantities from the previous image by following the so-called Bayes’ equation:
By replacing the appropriate numerical values, we obtain once again the probability that your partner is cheating on you: only 29%! How do you get this unexpectedly low result? Because you have started from a low prior probability (base rate) of unfaithfulness. Although your partner’s explanations of how that underwear has got into your drawer are rather unlikely, you started from the premise that your partner was faithful to you, and this weighed heavily on the equation. This is counterintuitive, because… Is not that piece of underwear in your drawer an evidence of your partner’s guilt?
Our System I heuristics, adapted to quick and intuitive judgements, prevent us from reaching better probability conclusions based on the available evidence. In this example, we pay excessive attention to the evidence (strange piece of underwear!) and forget the base rate (only 4% of unfaithfulness). When we let ourselves to be dazzled by new objective data at the expense of previous knowledge, our decisions will be consistently suboptimal.
But you are a Bayesian professional, right? So, you will give your partner the benefit of the doubt. Mind you, you could make a remark and tell your partner not to buy opposite-sex underwear, not to give you underwear, or not to invite a platonic friend to spend a night. Under these conditions, the probability that in the future a strange piece of underwear may appear in your drawer if your partner is faithful to you will be at most 1%, this is Pr(RI|¬E) = 0,01.
What would it happen if a few months later you find again strange underwear in your drawer? How would it change now your certainty belief that your partner is guilty? As a new evidence appears, a Bayesian practitioner will update their initial probability estimation. The posterior probability that your partner was cheating on you the first time (29%) will become the prior probability that your partner is cheating on you this second time. Bayesian practitioners adapt their evaluation of future probability events according to the new evidence. If you reintroduce the new values in the previous formula, Pr(E) = 0,29 y Pr(RI|¬E) = 0,01, the new posterior probability that your partner is being unfaithful will be 95%. Now you have grounds for divorce!
This illustrative example, from The Signal and the Noise: The Art and Science of Prediction, shows that:
- We let ourselves be dazzled by evidence when this is striking, vivid, and emotional.
- When our initial beliefs are very robust, they may be surprisingly impenetrable to the new evidence against them.
In the following part of this article, we will go over several case studies where Bayesian inference is successfully applied to cybersecurity.
Second part of this article:
» How to forecast the future and reduce uncertainty thanks to Bayesian inference (II).
Gonzalo Álvarez Marañón
Innovation and Labs (ElevenPaths)