提交 0aeaf2f3 编写于 作者: L Lyjeeq

translated the Several counter-intuitive Probability Problems and modified the REAMD.MD

上级 388a6af3
......@@ -62,7 +62,7 @@ This command specifies the `english` branch and limit the depth of clone, get ri
* [Russian Doll Envelopes Problem](think_like_computer/RussianDollEnvelopes.md)
* [Recursion In Detail](data_structure/RecursionInDetail.md)
* [Backtracking Solve Subset/Permutation/Combination](interview/Subset_Permutation_Combination.md)
* [几个反直觉的概率问题](think_like_computer/几个反直觉的概率问题.md)
* [Several counter-intuitive Probability Problems](think_like_computer/several_counter_intuitive_probability_problems.md)
* [洗牌算法](think_like_computer/洗牌算法.md)
* IV. High Frequency Interview Problem
......
# Several counter-intuitive probability problems
**Translator: [Lyjeeq](https://github.com/Lyjeeq)**
**Author: [labuladong](https://github.com/labuladong)**
Last article [Shuffle Algorithm](./Shuffle_Algorithm.md) talked about the Monte Carlo method of verifying the probability algorithm. Let me talk about something easy today: several interesting questions related to probability.
There are two simplest principles for calculating probability:
Principle 1: There must be a frame of reference for calculating probability, which is called "sample space", that is, all the possible results of random events. The probability of occurrence of event A = sample space contained by A / the total number of sample points.
Principle two: To calculate the probability, we must understand that the probability is a continuous whole, and the continuous probability cannot be separated, which is the so-called conditional probability.
The above two principles have been studied in high school, but we are still easy to make mistakes, and the process of making mistakes is similar.
First, the second principle was ignored, the sample space was calculated incorrectly, and then the wrong answer was calculated by the first principle.
Here are a few simple but confusing questions, namely the boy and girl problem, the birthday paradox, and the three-door problem. Of course, the three questions are probably the most familiar to everyone, so let's talk about some interesting thoughts.
### 1.Boy and girl problems
Suppose there is a family with two children, and now I tell you that there is a boy among them, what is the probability that the other is also a boy?
Many people, including me, answered without thinking: 1/2, because the other child is either a boy or a girl, and the probability is equal. But in fact, the answer is 1/3.
Why is the above idea wrong? Because the sample space is not calculated correctly, the principle one calculation is wrong. There are two children, so the sample space is 4, that is, the four cases of brother, sister, brother, sister, sister and brother. Knowing that there is a boy, then the situation of sisters and sisters is excluded, so the sample space becomes 3. The other child is also a case where the boy has only brother and brother, so the probability is 1/3.
Why does calculating the sample space go wrong? Because we ignore the conditional probability, which confuses the following two problems:
This family has only one child. What is the probability that this child is a boy?
What is the probability that this family has two children, one is a boy and the other is a boy?
According to principle two, the problem of probability is continuous, and the above two problems cannot be confused. The second problem requires conditional probability, that is, the probability that one child is a boy and the other is a boy. The formula using conditional probability is easy to calculate, so I won't say much.
Through this question, the reader should understand the relationship between the two probability calculation principles. The most confusing thing is the neglect of conditional probability. In order not to be confused, the easiest way is to exhaust all possible results.
Finally, I have seen a very strange question about this question: What if the two children are twins and there is no difference in age?
I actually think there is so much truth! But in fact, we only use the age difference to indicate the independence of the two children, which means that even if the two children are the same sex, there are two possibilities. So don't use twins for sophistry.
### 2.Birthday paradox
The birthday paradox is caused by the question: How many people are needed in a room to make it possible that there are at least two people whose birthday is the same day to reach 50%?
The answer is 23 people, which means that if there are 23 people in the house, there is a 50% chance that two people will have the same birthday. This conclusion looks incredible, so it is called paradox. Intuitively, to get a 50% probability, there must be at least 183 people, because there are 365 days in a year? Actually not. I think this conclusion is incredible. There are two main misunderstandings:
**The first is to misunderstand the meaning of the word "existence".**
Readers may think that if the probability of the same birthday among 23 people can reach 50%, does it mean:
Suppose there are 22 people sitting in the room now, and then I walk in, then there is a 50% probability that I can find a person with my birthday. How is that possible?
No, your idea is self-centered, and the probability of transition is describing the whole. The meaning of "existence" refers to any two of the 23 people, involving decentralized combinations, and the high probability has nothing to do with you.
If you have to calculate the probability of having the same person as your birthday, you can calculate it like this:
1-P (22 people are different from my birthday) = 1-(364/365) ^ 22 = 0.06
Does the calculated result look much more reasonable? The calculation of the birthday paradox is not a single person, but a whole, which contains the permutations and combinations of all people, and the sum of their probabilities will of course be much greater.
**The second misconception is that the probability changes linearly.**
The reader may think that if the probability of having the same birthday among 23 people can reach 50%, does it mean that the probability of 46 people can reach 100%?
No, like a game with a 50% winning rate, do you play 100% twice? Obviously not, the winning rate for your two plays is 75%:
$ P (twice) = P (the first time) + P (losing the first time but winning the second time) = 1/2 + 1/2 * 1/2 = 75 \% $
Then switching to the birthday paradox is also a reason. Probability is not a simple superposition, but a continuous process must be considered, so this conclusion is not unreasonable.
So why is the probability of 23 people having the same birthday greater than 50%? We first calculate the probability that the 23 birthdays are unique (not duplicate). When there is only one person, the only probability of a birthday is $ 365/365 $, when there are two persons, the only probability of a birthday is $ 365/365 × 364/365 $, and so on. It can be seen that the probability of a birthday of 23 people is unique:
![](../pictures/probability_problem/p.png)
It is calculated to be about 0.493, so the probability of having the same birthday is 0.507, which is almost 50%. In fact, according to this algorithm, when the number of people reaches 70, the probability that two people have the same birthday rises to 99.9%, which can basically be considered as 100%. So from a probability perspective, it's not unusual to have people with the same birthday in a small group of dozens of people.
### 3.Three-door problem
This game is classic: the game participants face three doors, two of which are goats behind and one is a sports car. The participant just chooses a door, and the thing behind the door belongs to him (of course, the value of a sports car is greater). But the host decided to help the participants: after he chose, he didn't rush to open the door, but the host opened one of the remaining two doors and showed the goats (the host knows each door) What's next), and then give the participant a chance to change the door. At this time, should the participant change the door or not?
In order to prevent confusion for readers who see this question for the first time, let's describe it in detail:
You are a game participant. Now you have gates 1,2,3. Suppose you choose gate 1 randomly, then the host opens gate 3 and tells you that there is a goat behind. Now, do you stick with your initial choice of Gate 1 or choose to switch to Gate 2?
![](../pictures/probability_problem/sanmen.png)
The answer is that you should change the door. The probability of getting a sports car after changing the door is 2/3. If you don't change, it is 1/3. Another counter-intuition, I feel that the probability of winning is the same, because there must be two doors left in the end, one is a sheep and the other is a sports car. This is a fact, so the probability is not 1/2 regardless of which one is chosen. ?
Similar to the boy and girl problem mentioned earlier, the simplest and safest method is to exhaust all possible results:
![Exhaustive tree](../pictures/probability_problem/tree.png)
It is easy to see that the probability of choosing a door to win is 2/3, if not, it is 1/3.
There is a simpler way to address this: the moderator's door opening is actually "condensing" the probability. The probability that you choose a sports car in the beginning is of course 1/3, and the probability that the remaining two doors contain a sports car is of course 2/3, which is nothing to say. But the host helped you rule out a door that contains a goat, which is equivalent to concentrating that 2/3 probability on the remaining door. So, do you say that you are holding the original 1/3 door, or replace it with the 2/3 probability that has been "condensed"?
To be more intuitive, suppose you choose one of the three, and you have 2 doors left, and then you will add 98 goat-mounted doors, and randomly scramble these 100 doors. Will you change them? I'm sure not to change it, because this obviously dilutes the probability, and it is most likely that the original door is the most sports car. Let's also suppose that there are 100 doors at first. You choose one. Then the host will help you to exclude 98 goats from the remaining 99 doors. Would you like to change one door? Surely change it. The door you have is 1% and the other door is 99%, or you can understand it this way. If you do n’t change, you just choose 1 door. Changing the door is equivalent to choosing 99 doors. Obviously right?
Some readers have thought about the above thoughts. Let us consider the following question: Suppose that when you decide whether to change the door, Xiao Ming breaks through the door and asks you to make a choice. He had no idea what had happened before, he only knew that there were two doors in front of him, a sports car and a goat, so what was the probability that he would win the sports car?
Of course, it is 1/2, which is the root cause of many people doing wrong three problems. Similar to the birthday paradox, people are always easy to be self-centered. From this Xiaoming's perspective, calculating whether to change doors is obviously a misunderstanding.
It ’s like having two boxes, the first box has 4 black balls and 2 red balls, the second box has 2 black balls and 4 red balls, choose a box, touch a ball, ask you Probability.
For unknown Xiao Ming, he will randomly choose a box, touch the ball randomly, and the probability of touching the red ball is: 1/2 × 2/6 + 1/2 × 4/6 = 1/2
For the informed you, you know that the probability of touching the ball in the second box is high, so only touching in the second box, the probability of touching the red ball is: 0 × 2/6 + 1 × 4/6 = 2/3
The three questions are instructive. For example, if you have a multiple-choice question, you first get A, and then you get rid of B and C with a clever move. Do you want to replace A with D? The answer is, change!
Perhaps the reader will ask, if only one answer is excluded, say B, then should I replace A with C or D? The answer is, change!
Because according to the idea of ​​"concentrating" probability just now, as long as you exclude it, you are doing "concentrating", and it will definitely be 1/4 higher than the answer you got at the beginning. For example, the correct probability of both C and D is 3/8, and the A of you start is only 1/4.
Of course, the premise of using this strategy is that you are really blind and really randomly choose answers, so that the probability can be used as the ace.
\ No newline at end of file
# 几个反直觉的概率问题
上篇文章 [洗牌算法详解](./洗牌算法.md) 讲到了验证概率算法的蒙特卡罗方法,今天聊点轻松的内容:几个和概率相关的有趣问题。
计算概率有下面两个最简单的原则:
原则一、计算概率一定要有一个参照系,称作「样本空间」,即随机事件可能出现的所有结果。事件 A 发生的概率 = A 包含的样本点 / 样本空间的样本总数。
原则二、计算概率一定要明白,概率是一个连续的整体,不可以把连续的概率分割开,也就是所谓的条件概率。
上述两个原则高中就学过,但是我们还是很容易犯错,而且犯错的流程也有异曲同工之妙:
先是忽略了原则二,错误地计算了样本空间,然后通过原则一算出了错误的答案。
下面介绍几个简单却具有迷惑性的问题,分别是男孩女孩问题、生日悖论、三门问题。当然,三门问题可能是大家最耳熟的,所以就多说一些有趣的思考。
### 一、男孩女孩问题
假设有一个家庭,有两个孩子,现在告诉你其中有一个男孩,请问另一个也是男孩的概率是多少?
很多人,包括我在内,不假思索地回答:1/2 啊,因为另一个孩子要么是男孩,要么是女孩,而且概率相等呀。但是实际上,答案是 1/3。
上述思想为什么错误呢?因为没有正确计算样本空间,导致原则一计算错误。有两个孩子,那么样本空间为 4,即哥哥妹妹,哥哥弟弟,姐姐妹妹,姐姐弟弟这四种情况。已知有一个男孩,那么排除姐姐妹妹这种情况,所以样本空间变成 3。另一个孩子也是男孩只有哥哥弟弟这 1 种情况,所以概率为 1/3。
为什么计算样本空间会出错呢?因为我们忽略了条件概率,即混淆了下面两个问题:
这个家庭只有一个孩子,这个孩子是男孩的概率是多少?
这个家庭有两个孩子,其中一个是男孩,另一个孩子是男孩的概率是多少?
根据原则二,概率问题是连续的,不可以把上述两个问题混淆。第二个问题需要用条件概率,即求一个孩子是男孩的条件下,另一个也是男孩的概率。运用条件概率的公式也很好算,就不多说了。
通过这个问题,读者应该理解两个概率计算原则的关系了,最具有迷惑性的就是条件概率的忽视。为了不要被迷惑,最简单的办法就是把所有可能结果穷举出来。
最后,对于此问题我见过一个很奇葩的质疑:如果这两个孩子是双胞胎,不存在年龄上的差异怎么办?
我竟然觉得有那么一丝道理!但其实,我们只是通过年龄差异来表示两个孩子的独立性,也就是说即便两个孩子同性,也有两种可能。所以不要用双胞胎抬杠了。
### 二、生日悖论
生日悖论是由这样一个问题引出的:一个屋子里需要有多少人,才能使得存在至少两个人生日是同一天的概率达到 50%?
答案是 23 个人,也就是说房子里如果有 23 个人,那么就有 50% 的概率会存在两个人生日相同。这个结论看起来不可思议,所以被称为悖论。按照直觉,要得到 50% 的概率,起码得有 183 个人吧,因为一年有 365 天呀?其实不是的,觉得这个结论不可思议主要有两个思维误区:
**第一个误区是误解「存在」这个词的含义。**
读者可能认为,如果 23 个人中出现相同生日的概率就能达到 50%,是不是意味着:
假设现在屋子里坐着 22 个人,然后我走进去,那么有 50% 的概率我可以找到一个人和我生日相同。但这怎么可能呢?
并不是的,你这种想法是以自我为中心,而题目的概率是在描述整体。也就是说「存在」的含义是指 23 人中的任意两个人,涉及排列组合,大概率和你没啥关系。
如果你非要计算存在和自己生日相同的人的概率是多少,可以这样计算:
1 - P(22 个人都和我的生日不同) = 1 -(364/365)^22 = 0.06
这样计算得到的结果是不是看起来合理多了?生日悖论计算对象的不是某一个人,而是一个整体,其中包含了所有人的排列组合,它们的概率之和当然会大得多。
**第二个误区是认为概率是线性变化的。**
读者可能认为,如果 23 个人中出现相同生日的概率就能达到 50%,是不是意味着 46 个人的概率就能达到 100%?
不是的,就像中奖率 50% 的游戏,你玩两次的中奖率就是 100% 吗?显然不是,你玩两次的中奖率是 75%:
$P(两次能中奖) = P(第一次就中了) + P(第一次没中但第二次中了) = 1/2 + 1/2*1/2 = 75\%$
那么换到生日悖论也是一个道理,概率不是简单叠加,而要考虑一个连续的过程,所以这个结论并没有什么不合常理之处。
那为什么只要 23 个人出现相同生日的概率就能大于 50% 了呢?我们先计算 23 个人生日都唯一(不重复)的概率。只有 1 个人的时候,生日唯一的概率是 $365/365$,2 个人时,生日唯一的概率是 $365/365 × 364/365$,以此类推可知 23 人的生日都唯一的概率:
![](../pictures/概率问题/p.png)
算出来大约是 0.493,所以存在相同生日的概率就是 0.507,差不多就是 50% 了。实际上,按照这个算法,当人数达到 70 时,存在两个人生日相同的概率就上升到了 99.9%,基本可以认为是 100% 了。所以从概率上说,一个几十人的小团体中存在生日相同的人真没啥稀奇的。
### 三、三门问题
这个游戏很经典了:游戏参与者面对三扇门,其中两扇门后面是山羊,一扇门后面是跑车。参与者只要随便选一扇门,门后面的东西就归他(跑车的价值当然更大)。但是主持人决定帮一下参与者:在他选择之后,先不急着打开这扇门,而是由主持人打开剩下两扇门中的一扇,展示其中的山羊(主持人知道每扇门后面是什么),然后给参与者一次换门的机会,此时参与者应该换门还是不换门呢?
为了防止第一次看到这个问题的读者迷惑,再具体描述一下这个问题:
你是游戏参与者,现在有门 1,2,3,假设你随机选择了门 1,然后主持人打开了门 3 告诉你那后面是山羊。现在,你是坚持你最初的选择门 1,还是选择换成门 2 呢?
![](../pictures/概率问题/sanmen.png)
答案是应该换门,换门之后抽到跑车的概率是 2/3,不换的话是 1/3。又一次反直觉,感觉换不换的中奖概率应该都一样啊,因为最后肯定就剩两个门,一个是羊,一个是跑车,这是事实,所以不管选哪个的概率不都是 1/2 吗?
类似前面说的男孩女孩问题,最简单稳妥的方法就是把所有可能结果穷举出来:
![穷举树](../pictures/概率问题/tree.png)
很容易看到选择换门中奖的概率是 2/3,不换的话是 1/3。
关于这个问题还有更简单的方法:主持人开门实际上在「浓缩」概率。一开始你选择到跑车的概率当然是 1/3,剩下两个门中包含跑车的概率当然是 2/3,这没啥可说的。但是主持人帮你排除了一个含有山羊的门,相当于把那 2/3 的概率浓缩到了剩下的这一扇门上。那么,你说你是抱着原来那扇 1/3 的门,还是换成那扇经过「浓缩」的 2/3 概率的门呢?
再直观一点,假设你三选一,剩下 2 扇门,再给你加入 98 扇装山羊的门,把这 100 扇门随机打乱,问你换不换?肯定不换对吧,这明摆着把概率稀释了,肯定抱着原来的那扇门是最可能中跑车的。再假设,初始有 100 扇门,你选了一扇,然后主持人在剩下 99 扇门中帮你排除 98 个山羊,问你换不换一扇门?肯定换对吧,你手上那扇门是 1%,另一扇门是 99%,或者也可以这样理解,不换只是选择了 1 扇门,换门相当于选择了 99 扇门,这样结果很明显了吧?
以上思想,也许有的读者都思考过,下面我们思考这样一个问题:假设你在决定是否换门的时候,小明破门而入,要求帮你做出选择。他完全不知道之前发生的事,他只知道面前有两扇门,一扇是跑车一扇是山羊,那么他抽中跑车的概率是多大?
当然是 1/2,这也是很多人做错三门问题的根本原因。类似生日悖论,人们总是容易以自我为中心,通过这个小明的视角来计算是否换门,这显然会进入误区。
就好比有两个箱子,一号箱子有 4 个黑球 2 个红球,二号箱子有 2 个黑球 4 个红球,随便选一个箱子,随便摸一个球,问你摸出红球的概率。
对于不知情的小明,他会随机选择一个箱子,随机摸球,摸到红球的概率是:1/2 × 2/6 + 1/2 × 4/6 = 1/2
对于知情的你,你知道在二号箱子摸球概率大,所以只在二号箱摸,摸到红球的概率是:0 × 2/6 + 1 × 4/6 = 2/3
三门问题是有指导意义的。比如你蒙选择题,先蒙了 A,后来灵机一动排除了 B 和 C,请问你是否要把 A 换成 D?答案是,换!
也许读者会问,如果只排除了一个答案,比如说 B,那么我是否应该把 A 换成 C 或者 D 呢?答案是,换!
因为按照刚才「浓缩」概率这个思想,只要进行了排除,都是在进行「浓缩」,均摊下来肯定比你一开始蒙的那个答案概率 1/4 高。比如刚才的例子,C 和 D 的正确概率都是 3/8,而你开始蒙的 A 只有 1/4。
当然,运用此策略蒙题的前提是你真的抓瞎,真的随机乱选答案,这样概率才能作为最后的杀手锏。
坚持原创高质量文章,致力于把算法问题讲清楚,欢迎关注我的公众号 labuladong 获取最新文章:
![labuladong](../pictures/labuladong.jpg)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册