Differences

This shows you the differences between two versions of the page.

--- b:head_first_statistics:geometric_binomial_and_poisson_distributions [2023/10/17 17:54] – [What does the Poisson distribution look like?] hkimscil
+++ b:head_first_statistics:geometric_binomial_and_poisson_distributions [2024/10/28 08:37] (current) – [Broken Cookies case] hkimscil
@@ Line 1: / Line 1: @@
 ====== Geometric Binomial and Poisson Distributions ======
+정리
+\begin{align*}
+\text{Geometric Distribution:  } \;\;\; \text{X} & \thicksim Geo(p) \\
+p(X = k) & = q^{k-1} \cdot p \\
+E\left[ X \right] & = \frac{1}{p} \\
+V\left[ X \right] & = \frac{q}{p^2} \\
+\\
+\text{Binomial Distribution:  } \;\;\; \text{X} & \thicksim B(n, p) \\
+p(X = r) & = \binom{n}{r} \cdot p^{r} \cdot q^{n-r} \\
+E\left[ X \right] & = {n}{p} \\
+V\left[ X \right] & = {n}{p}{q} \\
+\\
+\text{Poisson Distribution:  } \;\;\; \text{X} & \thicksim P( \lambda ) \\
+P(X=r) & = e^{- \lambda} \cdot \dfrac{\lambda^{r}} {r!} \\
+E\left[ X \right] & = \lambda \\
+V\left[ X \right] & = \lambda \\
+\end{align*}
 ===== Geometric Distributions =====
@@ Line 27: / Line 45: @@
 | X  | P(X=x)  | Power of 0.8  | Power of 0.2  |
-| 1  | 0.2  | 0  | 1  |
+| 1  | 0.8<sup>0</sup> * 0.2   | 0  | 1  |
-| 2  | 0.8 * 0.2   | 1  | 1  |
+| 2  | 0.8<sup>1</sup> * 0.2   | 1  | 1  |
 | 3  | 0.8<sup>2</sup> * 0.2   | 2  | 1  |
 | 4  | 0.8<sup>3</sup> * 0.2   | 3  | 1  |
@@ Line 39: / Line 57: @@
 This formula is called the **geometric distribution**.
-  - You run a series of independent trials.
+  * You run a series of independent trials.   * (각 시행이 독립적임 = 이번 시행이 이전 시행과 상관없이 일어남)
-  - There can be either a success or failure for each trial, and the probability of success is the same for each trial.
+  * There can be either a success or failure for each trial, and the probability of success is the same for each trial. (성공/실패만 일어나는 상황에서, 성공하는 확률은 p로 실행햇수동안 일정)
-  - The main thing you’re interested in is how many trials are needed in
+  * The main thing you’re interested in is how many trials are needed in order to get the first successful outcome. (성공하면 중단하고 성공할 때까지의 확률을 분포로 봄)
-order to get the first successful outcome.
 $ P(X=r) = {p \cdot q^{r-1}} $
@@ Line 74: / Line 91: @@
 {{:b:head_first_statistics:pasted:20191030-023820.png}}
-r번 시도한 다음에 성공을 얻을 확률
+r번 시도한 이후, 그 이후 어디서든지 간에 성공을 얻을 확률
-첫 번째 성공을 얻을 때까지 r번 이상 시도를 해야하는 확률
 $$ P(X > r) = q^{r} $$
-번 시도 후에 어디선가 성공할 확률은?
+예, 20번 시도 후에 어디선가 성공할 확률은?
 Solution.
@@ Line 84: / Line 100: @@
   * 위는 구할 수 없음
   * 따라서
+  * 전체 확률이 1이고 20번째까지 성공한 확률을 1에서 빼면 원하는 확률이 됨
   * 1 - (1번째 성공 + 2번째 성공 + . . . + 20번째 성공)
   * 그런데 이것은
@@ Line 92: / Line 109: @@
 n <- 19
 s <- dgeom(x = 0:n, prob = p)
-# 20번째까지 성공할 확률
+# 20번째까지 성공할 확률을 모두 더한 확률
 sum(s)
-# 따라서 아래는 20번 이후에 성공할 확률
+# 따라서 아래는 20번 이후 어디서든지 간에서 성공할 확률
 -sum(s)
 ## 혹은 (교재가 이야기하는) 20번까지 실패하는 확률
@@ Line 108: / Line 125: @@
 > sum(s)
 [1] 0.9884708
-> # 따라서 아래는 20번 이후에 성공할 확률
+> # 따라서 아래는 20번 이후 어디서든지 간에서 성공할 확률
 > 1-sum(s)
 [1] 0.01152922
@@ Line 629: / Line 646: @@
   - n번의 (독립적인) 시행에서 사건 A가 발생할 때의 확률 분포를
   - **이항확률분포**라고 한다.
+아래를 보면
+  * 각 한문제를 맞힐 확률은 1/4, 틀릴 확률은 3/4
+  * 3문제를 풀면서 (3번의 시행) 각 문제를 맞힐 확률 분포를 말한다.
 {{:b:head_first_statistics:pasted:20191030-035316.png}}
@@ Line 659: / Line 679: @@
 $$P(X = r) = _{n}C_{r} \cdot p^{r} \cdot q^{n-r}$$
-  - You’re running a series of independent trials.
+  - You’re running a series of independent trials. (n번의 시행을 하게 된다)
-  - There can be either a success or failure for each trial, and the probability of success is the same for each trial.
+  - There can be either a success or failure for each trial, and the probability of success is the same for each trial. (각 시행은 성공/실패로 구분되고 성공의 확률은 (반대로 실패의 확률도) 각 시행마다 동일하다)
-  - There are a finite number of trials. (note that this is different from that of geometric distribution)
+  - There are a finite number of trials. Note that this is different from that of geometric distribution. (n번의 시행으로 한정된다. 무한대 시행이 아님)
 X가 n번의 시행에서 성공적인 결과를 얻는 수를 나타낸다고 할 때, r번의 성공이 있을 확률을 구하려면 아래 공식을 이용한다.
@@ Line 683: / Line 703: @@
 \begin{eqnarray*}
-E(X) & = & \sum{n*p(x)} \\
+E(X) & = & \sum{x * p(x)} \\
-& = & (1*p)+(0*q) \\
+& = & (0*q) + (1*p) \\
 & = & p
 \end{eqnarray*}
@@ Line 735: / Line 755: @@
 n <-5
 # combinations of 5,2
-c <- choose(5,2)
+c <- choose(n,r)
 ans1 <- c*(p^r)*(q^(n-r))
 ans1
@@ Line 745: / Line 765: @@
 > n <-5
 > # combinations of 5,2
-> c <- choose(5,2)
+> c <- choose(n,r)
 > ans <- c*(p^r)*(q^(n-r))
 > ans
@@ Line 760: / Line 780: @@
 n <-5
 # combinations of 5,3
-c <- choose(5,3)
+c <- choose(n,r)
 ans2 <- c*(p^r)*(q^(n-r))
 ans2
@@ Line 770: / Line 790: @@
 > n <-5
 > # combinations of 5,3
-> c <- choose(5,3)
+> c <- choose(n,r)
 > ans2 <- c*(p^r)*(q^(n-r))
 > ans2
@@ Line 924: / Line 944: @@
 ==== From a scratch (Proof of Binomial Expected Value) ====
-see [[:The Binomial Theorem]]
+[[:Mean and Variance of Binomial Distribution|이항분포에서의 기댓값과 분산에 대한 수학적 증명]], Mathematical proof of Binomial Distribution Expected value and Variance
-\begin{eqnarray*}
-\text{The binomial theorem} &  & \\
-(a + b)^{m} & = & \sum^{m}_{y=0}{{m}\choose{y}} a^{y} b^{m-y} \\
-\end{eqnarray*}
-위의 식이 복잡해 보이지만 m = 3 일때의 이항정리식을 말한다
-\begin{align*}
-\sum^{m}_{y=0}{{m}\choose{y}} a^{y} b^{m-y} \text{, m = 3} \\
-\end{align*}
-\begin{align*}
-\sum^{3}_{y=0}{{3}\choose{y}} a^{y} b^{3-y}
-& = {{3}\choose{0}} a^{0} b^{3-0}
-+ {{3}\choose{1}} a^{1} b^{3-1}
-+ {{3}\choose{2}} a^{2} b^{3-2}
-+ {{3}\choose{3}} a^{y} b^{3-3}  \\
-& = 1*a^0*b^3
-+ 3*a^1*b^2
-+ 3*a^2*b^1
-+ 1*a^3*b^0  \\
-& = a^3
-+ 3 a^2 b^1
-+ 3 a^1 b^2
-+ b^3  \\
-\end{align*}
-==== For Mean ====
-\begin{eqnarray*}
-E(X) & = & \sum_{x}x p(x) \\
-& = & \sum_{x=0}^{n} x {{n}\choose {x}} p^x(1-p)^{n-x}  \\
-& = & \sum_{x=0}^{n} x \frac{n!}{x!(n-x)!} p^x(1-p)^{n-x}  \\
-\text{note that   } x! = x(x-1)! \\
-& = & \sum_{x=1}^{n} \frac{n!}{(x-1)!(n-x)!} p^x(1-p)^{n-x}  \\
-\text{cause we know that E(x) = np,} \\
-\text{we extract np outside from summation} \\
-\text{note that  } p^x = p * p^{x-1} \\
-\text{and  } n! = n * (n-1)! \\
-& = & \sum_{x=1}^{n} \frac{\underline{n}*(n-1)!}{(x-1)!(n-x)!} (\underline{p}*p^{x-1})(1-p)^{n-x}  \\
-\text{we take out the underlined part} \\
-\text{(that is, np) out of the sigma part } \\
-& = & np \sum_{x=1}^{n} \frac{(n-1)!}{(x-1)!(n-x)!} p^{x-1}(1-p)^{n-x}  \\
-& = & np \underline{ \sum_{x=1}^{n} {\frac{(n-1)!}{(x-1)!(n-x)!} p^{x-1}(1-p)^{n-x}}}  \\
-\text{we want to check the underlined} \\
-\text{part is equal to one so that np is left out} \\
-n-x = (n-1)-(x-1) \\
-& = & np \sum_{x=1}^{n} \frac{(n-1)!}{(x-1)!((n-1)-(x-1))!} p^{x-1}(1-p)^{(n-1)-(x-1)}  \\
-m = n - 1 \\
-y = x - 1 \\
-& = & np \sum_{y=0}^{m} \frac{(m)!}{(y)!(m-y))!} p^{y}(1-p)^{(m-y)}  \\
-& = & np \sum_{y=0}^{m} \frac{(m)!}{(y)!(m-y))!} p^{y}(1-p)^{(m-y)}  \\
-& = & np \sum_{y=0}^{m} {{m}\choose {y}} p^{y}(1-p)^{(m-y)}  \\
-\text{Recall that} \\
-\sum^{m}_{y=0}{{m}\choose{y}} a^{y} b^{m-y} = (a + b)^{m} \\
-& = & np (p + (1-p))^m  \\
-& = & np (1)^m  \\
-& = & np  \\
-\end{eqnarray*}
-==== For variance ====
-\begin{align*}
-Var(X) & = E[(X-\mu)^2] \\
-& = \sum_{x}(x-\mu)^2p(x) \\
-\text{We also know that: } \\
-E[(X-\mu)^2] & = E(X^2) - [E(X)]^2 \\
-\end{align*}
-\begin{align*}
-E(X^2) & = \sum_{x=0}^{n} x^2 {{n}\choose{x}} p^x (1-p)^{n-x} \\
-& = \sum_{x=0}^{n} x^2 \frac{n!}{x!(n-x)!} p^x (1-p)^{n-x} \\
-\end{align*}
-그런데 $E(X^2)$ 대신 $E[X(X-1)]$을 생각해보면
-\begin{align*}
-E[X(X-1)] & = \sum_{x=0}^{n} x(x-1) \frac{n!}{x!(n-x)!} p^x (1-p)^{n-x} \\
-& = \sum_{x=2}^{n} \frac{n!}{(x-2)!(n-x)!} p^x (1-p)^{n-x} \\
-& = n(n-1)p^2 \sum_{x=2}^{n} \frac{(n-2)!}{(x-2)!(n-x)!} p^{x-2} (1-p)^{n-x} \\
-\text{cause} \\
-n - x = (n - 2)-(x - 2) \\
-& = n(n-1)p^2 \sum_{x=2}^{n} \frac{(n-2)!}{(x-2)!((n-2)-(x-2))!} p^{x-2} (1-p)^{(n-2)-(x-2)} \\
-m = n - 2 \\
-y = x - 2 \\
-& = n(n-1)p^2 \sum_{y=0}^{m} \frac{m!}{y!((m-y))!} p^{y} (1-p)^{(m-y)} \\
-& = n(n-1)p^2 \underline {\sum_{y=0}^{m} \frac{m!}{y!((m-y))!} p^{y} (1-p)^{m-y} } \\
-& = n(n-1)p^2 \underline {\sum_{y=0}^{m} {{m}\choose{y}} p^{y} (1-p)^{m-y} } \\
-\text {we know that the underline part is} \\
-(p+(1-p))^m = 1^m \\
-& = n(n-1)p^2 (p + (1-p))^m \\
-& = n(n-1)p^2 \\
-\end{align*}
-. . . .
-\begin{align*}
-E[X(X - 1)] & = n(n-1)p^2 \\
-E[X^2 - X] & = n(n-1)p^2 \\
-E[X^2]- E[X] & = n(n-1)p^2 \\
-E[X^2]- np & = n(n-1)p^2 \\
-E[X^2]& = n(n-1)p^2 + np \\
-\\
-\end{align*}
-\begin{align*}
-Var(X) & = E[X^2] - [E(X)]^2  \\
-& = \left[n(n-1)p^2 + np \right] - \left[np \right]^2  \\
-& = np[(n-1)p + 1 - np]  \\
-& = np[np - p + 1 - np]  \\
-& = np(1 - p)  \\
-& = npq  \\
-\\
-\end{align*}
 ====== Poisson Distribution ======
 $$X \sim Po(\lambda)$$
@@ Line 1071: / Line 976: @@
 \end{eqnarray*}
 왜 $e^{\lambda} = \left(1 + \lambda + \dfrac{\lambda^{2}}{2!} + \dfrac{\lambda^{3}}{3!} + . . . \right)$ 인지는 [[:Taylor series]] 문서를 참조.
+이것이 의미하는 것은 r이 0에서 무한대로 갈 때의 확률값의 분포를 말하므로 전체 분포가 1이 됨을 의미한다. 아래 "What does the Poisson distribution look like?" 참조
 <code>
@@ Line 1082: / Line 988: @@
 위의 그림은 lambda는 2, 즉 한달에 아주대학교 앞의 건널목 주변 찻길에서 교통사고가 날 횟수가 2회라고 할 때, X=3 이므로 3번 교통사고가 일어날 확률을 (P(X=3)) 묻는 문제이다.
 \begin{eqnarray*}
-P(X = 3) & = & \frac {e^{-2} * 2^{3}}{3!} \\
+P(X = 3) & = & e^{-2} * \frac {2^{3}}{3!} \\
 & = & 0.180
 \end{eqnarray*}
@@ Line 1112: / Line 1018: @@
 위에서 언급한
+\begin{eqnarray*}
+\sum_{r=0}^{\infty} e^{- \lambda} \dfrac{\lambda^{r}} {r!}
+& = & e^{- \lambda} \sum_{r=0}^{\infty} \dfrac{\lambda^{r}} {r!}  \\
+& = & e^{- \lambda} \left(1 + \lambda + \dfrac{\lambda^{2}}{2!} + \dfrac{\lambda^{3}}{3!} + . . . \right) \\
+& = & e^{- \lambda}e^{\lambda} \\
+& = & 1
+\end{eqnarray*}
+에서 1 이란 이야기는 아래 그림의 그래프가 전체가 1이 됨을 의미함. 즉 위에서는 1부터 60까지 갔지만, 1부터 무한대로 하면 완전한 분포곡선이 되는데 이것이 1이라는 뜻 (가령 dpois(x=1:1000, lambda=30)과 같은 케이스).
 [{{:b:head_first_statistics:pasted:20191107-095627.png|Figure 1. lambda=30}}]
-lambda가 클 수록 좌우대칭의 종형분포를 이루고 ((Figure 1)), 작을 수로 오른 쪽으로 편향된 (skewed to the right) 혹은 양의방향으로 편향된(positively skewed) 분포를 ((Figure 2)) 이룬다.
+lambda가 클 수록 좌우대칭의 종형분포를 이루고 ((Figure 1)), 작을 수록 오른 쪽으로 편향된 (skewed to the right) 혹은 양의방향으로 편향된(positively skewed) 분포를 ((Figure 2)) 이룬다.
 <code>
@@ Line 1152: / Line 1070: @@
 \begin{eqnarray*}
-P(X=0) & = & \frac{e^{-3.4}*3.4^{0}} {0!}  \\
+P(X=0) & = & e^{-3.4} * \frac{3.4^{0}} {0!}  \\
 & = & e^{-3.4} \\
 & = & 0.03337327
@@ Line 1158: / Line 1076: @@
 <code>
+# R 에서 계산
 > e^(-3.4)
+[1] 0.03337327
+>
+# 혹은
+> dpois(0, 3.4)
 [1] 0.03337327
 >
 </code>
+포아송 분포를 따르는 확률에서 아무것도 일어나지 않을 때의 확률은 e<sup>-lambda </sup>가 된다. 예를 들면 119 전화가 한시간에 걸려오는 확률이 5번이라고 할 때,  지난 한 시간동안 한 건의 전화도 없을 확률은?
+\begin{eqnarray*}
+P(X=0) & = & e^{-5} * \frac{5^{0}} {0!}  \\
+& = & e^{-5} \\
+& = & 0.006737947
+\end{eqnarray*}
+<code>
+> lamba <- 5
+> e <- exp(1)
+> px.0 <- e^(-lamba)
+>
+> px.0
+[1] 0.006737947
+>
+# or
+> dpois(0,5)
+[1] 0.006737947
+</code>
 __2. What’s the probability of the machine malfunctioning three times next week?__
@@ Line 1184: / Line 1128: @@
 > dpois(x=3, lambda=3.4)
 [1] 0.2186172
+</code>
+마찬가지로 적어도 3번까지 고장나는 경우는 0, 1, 2, 3을 포함하므로
+<code>
+> sum(dpois(c(0:3), lambda=3.4))
+[1] 0.5583571
+>
 </code>
@@ Line 1239: / Line 1190: @@
 **How did Kate find the probability so quickly, and avoid the error on her calculator?**
 </WRAP>
+우선 위의 문제를 binomial distribution 문제로 생각하면 답은
+\begin{eqnarray*}
+P(r=15) & = & _{100}C_{15} * 0.1^{15} * 0.99^{85}\\
+\end{eqnarray*}
+라고 볼 수 있다.
 \begin{eqnarray}
@@ Line 1270: / Line 1226: @@
 > a*b*c
 [1] 0.03268244
+>
+</code>
+위가 답이긴 하지만 limited calculator 로는
+x ~ b (n, p)이고
+b(100, 0.1)이므로
+n*p = 10 = lambda
+따라서 Pois 분포로 보는 답은
+lambda = 10 일때 P(r=15)값을 구하는 문제로
+\begin{eqnarray*}
+P(r = 15) & = & e^{-10} * \frac {10^{15}}{15!} \\
+& = & 0.0347180
+\end{eqnarray*}
+<code>
+> dpois(x=15, lambda=10)
+[1] 0.03471807
 >
 </code>
@@ Line 1386: / Line 1358: @@
 <WRAP box>
-. On average, 1 bus stops at a certain point every 15 minutes. What’s the probability that no buses will turn up in a single 15 minute interval?
+. On average, 1 bus stops at a certain point every 15 minutes. What’s the probability that __<fc #ff0000>no buses</fc>__ will turn up in a single 15 minute interval?
 위는 Poisson distribution 문제이므로 기대값과 분산값은 각각 lambda 값인 1 (15분마다 1대씩 버스가 온다고 한다)