======= Binomial Distribution =======

  - 1번의 시행에서 특정 사건 A가 발생할 확률을 p라고 하면 
  - n번의 (독립적인) 시행에서 사건 A가 발생할 때의 확률 분포를 
  - **이항확률분포**라고 한다.
아래를 보면
  * 각 한문제를 맞힐 확률은 1/4, 틀릴 확률은 3/4
  * 3문제를 풀면서 (3번의 시행) 각 문제를 맞힐 확률 분포를 말한다. 

{{:b:head_first_statistics:pasted:20191030-035316.png}}
{{:b:head_first_statistics:pasted:20191030-035452.png}}

| x  | P(X=x)                    | power of .75  | power of .25  |
| 0  | 0.75 * 0.75 * 0.75        | 3  | 0  |
| 1  | 3 * (0.75 * 0.75 * 0.25)  | 2  | 1  |
| 2  | 3 * (0.75 * 0.25 * 0.25)  | 1  | 2  |
| 3  | 0.25 * 0.25 * 0.25        | 0  | 3  |
{{:b:head_first_statistics:pasted:20191030-040346.png}}

$$P(X = r) = {\huge\text{?} \cdot 0.25^{r} \cdot 0.75^{3-r}} $$
$$P(X = r) = {\huge_{3}C_{r}} \cdot 0.25^{r} \cdot 0.75^{3-r}$$

$_{n}C_{r}$은 n개의 사물에서 r개를 (순서없이) 고르는 방법의 수라고 할 때, 3개의 질문 중에서 한 개의 정답을 맞히는 방법은 $_{3}C_{1} = 3$ 세가지가 존재.


Probability for getting one question right
\begin{eqnarray*}
P(X = r) & = &  _{3}C_{1} \cdot 0.25^{1} \cdot 0.75^{3-1} \\
& = & \frac{3!}{1! \cdot (3-1)!} \cdot 0.25 \cdot 0.75^2 \\
& = & 3 \cdot 0.25 \cdot 0.5625 \\
& = & 3 \cdot 0.25 \cdot 0.5625 \\
& = & 0.421875
\end{eqnarray*} 

$$P(X = r) = _{n}C_{r} \cdot 0.25^{r} \cdot 0.75^{n-r}$$
$$P(X = r) = _{n}C_{r} \cdot p^{r} \cdot q^{n-r}$$

  - You’re running a series of independent trials. (n번의 시행을 하게 된다)
  - There can be either a success or failure for each trial, and the probability of success is the same for each trial. (각 시행은 성공/실패로 구분되고 성공의 확률은 (반대로 실패의 확률도) 각 시행마다 동일하다)
  - There are a finite number of trials. Note that this is different from that of geometric distribution. (n번의 시행으로 한정된다. 무한대 시행이 아님)

X가 n번의 시행에서 성공적인 결과를 얻는 수를 나타낸다고 할 때, r번의 성공이 있을 확률을 구하려면 아래 공식을 이용한다.

\begin{eqnarray*} 
P(X = r) & = & _{n}C_{r} \cdot p^{r} \cdot q^{n-r} \;\;\; \text{Where,} \\
_{n}C_{r} & = & \frac {n!}{r!(n-r)!}
\end{eqnarray*} 

p = 각 시행에서 성공할 확률
n = 시행 숫자
r = r 개의 정답을 구할 확률

$$X \sim B(n,p)$$

====== Expectation and Variance of Binomial Distribution ======
Toss a fair coin once. What is the distribution of the number of heads?
  * A single trial
  * The trial can be one of two possible outcomes -- success and failure
  * P(success) = p
  * P(failure) = 1-p

X = 0, 1 (failure and success)
$P(X=x) = p^{x}(1-p)^{1-x}$ or 
$P(x) = p^{x}(1-p)^{1-x}$

참고.
| x     | 0          | 1  |
| p(x)  | q = (1-p)  | p  | 

When x = 0 (failure), $P(X = 0) = p^{0}(1-p)^{1-0} = (1-p)$ = Probability of failure
When x = 1 (success), $P(X = 1) = p^{1}(1-p)^{0} = p $ = Probability of success


This is called Bernoulli distribution.
  * Bernoulli distribution expands to binomial distribution, geometric distribution, etc.
  * Binomial distribution = The distribution of number of success in n independent Bernoulli trials.
  * Geometric distribution = The distribution of number of trials to get the first success in independent Bernoulli trials.

$$X \sim B(1,p)$$

\begin{eqnarray*}
E(X) & = & \sum{x * p(x)} \\
& = & (0*q) + (1*p) \\
& = & p 
\end{eqnarray*} 


\begin{eqnarray*}
Var(X) & = & E((X - E(X))^{2}) \\
& = & \sum_{x}(x-E(X))^2p(x)   \ldots \ldots \ldots E(X) = p \\
& = & (0 - p)^{2}*q + (1 - p)^{2}*p  \\
& = & (0^2 - 2p0 + p^2)*q + (1-2p+p^2)*p \\
& = & p^2*(1-p) + (1-2p+p^2)*p \\
& = & p^2 - p^3 + p - 2p^2 + p^3 \\
& = & p - p^2 \\
& = & p(1-p) \\
& = & pq
\end{eqnarray*}

For generalization, 

$$X \sim B(n,p)$$

\begin{eqnarray*}
E(X) & = & E(X_{1}) + E(X_{2}) + ... + E(X_{n}) \\
& = & n * E(X_{i}) \\
& = & n * p 
\end{eqnarray*}

\begin{eqnarray*}
Var(X) & = & Var(X_{1}) + Var(X_{2}) + ... + Var(X_{n}) \\
& = & n * Var(X_{i}) \\
& = & n * p * q 
\end{eqnarray*}

====== e.g., ======
<WRAP box>
In the latest round of Who Wants To Win A Swivel Chair, there are 5 questions. The probability of
getting a successful outcome in a single trial is 0.25
  - What’s the probability of getting exactly two questions right?
  - What’s the probability of getting exactly three questions right? 
  - What’s the probability of getting two or three questions right? 
  - What’s the probability of getting no questions right?
  - What are the expectation and variance?
</WRAP>

Ans 1. 
<code>
p <- .25
q <- 1-p
r <- 2
n <-5
# combinations of 5,2
c <- choose(n,r) 
ans1 <- c*(p^r)*(q^(n-r))
ans1    # or

choose(n, r)*(p^r)*(q^(n-r))

dbinom(r, n, p)

</code>

<code>
> p <- .25
> q <- 1-p
> r <- 2
> n <-5
> # combinations of 5,2
> c <- choose(n,r)
> ans <- c*(p^r)*(q^(n-r))
> ans
[1] 0.2636719
>
> choose(n, r)*(p^r)*(q^(n-r))
[1] 0.2636719
>
> dbinom(r, n, p)
[1] 0.2636719
> 
> 
</code>


Ans 2. 
<code>
p <- .25
q <- 1-p
r <- 3
n <-5
# combinations of 5,3
c <- choose(n,r)
ans2 <- c*(p^r)*(q^(n-r))
ans2

choose(n, r)*(p^r)*(q^(n-r))

dbinom(r, n, p)

</code>
<code>
> p <- .25
> q <- 1-p
> r <- 3
> n <-5
> # combinations of 5,3
> c <- choose(n,r)
> ans2 <- c*(p^r)*(q^(n-r))
> ans2
[1] 0.08789062
> 
> choose(n,r)*(p^r)*(q^(n-r))
[1] 0.08789062
> 
> dbinom(r, n, p)
[1] 0.08789063
> 
> 
</code>

Ans 3. 중요 
<code>
ans1 + ans2
dbinom(2, 5, .25) + dbinom(3, 5, .25) 
dbinom(2:3, 5, .25)
sum(dbinom(2:3, 5, .25))
pbinom(3, 5, .25) - pbinom(1, 5, .25)
</code>

<code>
> ans1 + ans2
[1] 0.3515625
> dbinom(2, 5, .25) + dbinom(3, 5, .25) 
[1] 0.3515625
> dbinom(2:3, 5, .25)
[1] 0.26367187 0.08789063
> sum(dbinom(2:3, 5, .25))
[1] 0.3515625
> pbinom(3, 5, .25) - pbinom(1, 5, .25)
[1] 0.3515625
> 
</code>

Ans 4. 
<code>
p <- .25
q <- 1-p
r <- 0
n <-5
# combinations of 5,3
c <- choose(n,r)
ans4 <- c*(p^r)*(q^(n-r))
ans4
</code>

<code>> p <- .25
> q <- 1-p
> r <- 0
> n <-5
> # combinations of 5,3
> c <- choose(n,r)
> ans4 <- c*(p^r)*(q^(n-r))
> ans4
[1] 0.2373047
> </code>

Ans 5
<code>
p <- .25
q <- 1-p
n <- 5
exp.x <- n*p
exp.x
</code>
<code>> p <- .25
> q <- 1-p
> n <- 5
> exp.x <- n*p
> exp.x
[1] 1.25</code>

<code>
p <- .25
q <- 1-p
n <- 5
var.x <- n*p*q
var.x
</code>
<code>> p <- .25
> q <- 1-p
> n <- 5
> var.x <- n*p*q
> var.x
[1] 0.9375
> </code>

Q. 한 문제를 맞힐 확률은 1/4 이다. 총 여섯 문제가 있다고 할 때, 0에서 5 문제를 맞힐 확률은? dbinom을 이용해서 구하시오.
<code>
p <- 1/4
q <- 1-p
n <- 6
pbinom(5, n, p)

1 - dbinom(6, n, p)
</code> 
<code>
> p <- 1/4
> q <- 1-p
> n <- 6
> pbinom(5, n, p)
[1] 0.9997559
> 1 - dbinom(6, n, p)
[1] 0.9997559

</code>

중요 . . . . 
<code>
# http://commres.net/wiki/mean_and_variance_of_binomial_distribution
# ##################################################################
#
p <- 1/4
q <- 1 - p
n <- 5
r <- 0
all.dens <- dbinom(0:n, n, p)
all.dens
sum(all.dens)

choose(5,0)*p^0*(q^(5-0))
choose(5,1)*p^1*(q^(5-1))
choose(5,2)*p^2*(q^(5-2))
choose(5,3)*p^3*(q^(5-3))
choose(5,4)*p^4*(q^(5-4))
choose(5,5)*p^5*(q^(5-5))
all.dens

choose(5,0)*p^0*(q^(5-0)) + 
  choose(5,1)*p^1*(q^(5-1)) + 
  choose(5,2)*p^2*(q^(5-2)) + 
  choose(5,3)*p^3*(q^(5-3)) + 
  choose(5,4)*p^4*(q^(5-4)) + 
  choose(5,5)*p^5*(q^(5-5))
sum(all.dens)
# 
(p+q)^n
# note that n = whatever, (p+q)^n = 1

</code>

<code>
> # http://commres.net/wiki/mean_and_variance_of_binomial_distribution
> # ##################################################################
> #
> p <- 1/4
> q <- 1 - p
> n <- 5
> r <- 0
> all.dens <- dbinom(0:n, n, p)
> all.dens
[1] 0.2373046875 0.3955078125 0.2636718750 0.0878906250
[5] 0.0146484375 0.0009765625
> sum(all.dens)
[1] 1
> 
> choose(5,0)*p^0*(q^(5-0))
[1] 0.2373047
> choose(5,1)*p^1*(q^(5-1))
[1] 0.3955078
> choose(5,2)*p^2*(q^(5-2))
[1] 0.2636719
> choose(5,3)*p^3*(q^(5-3))
[1] 0.08789062
> choose(5,4)*p^4*(q^(5-4))
[1] 0.01464844
> choose(5,5)*p^5*(q^(5-5))
[1] 0.0009765625
> all.dens
[1] 0.2373046875 0.3955078125 0.2636718750 0.0878906250
[5] 0.0146484375 0.0009765625
> 
> choose(5,0)*p^0*(q^(5-0)) + 
+   choose(5,1)*p^1*(q^(5-1)) + 
+   choose(5,2)*p^2*(q^(5-2)) + 
+   choose(5,3)*p^3*(q^(5-3)) + 
+   choose(5,4)*p^4*(q^(5-4)) + 
+   choose(5,5)*p^5*(q^(5-5))
[1] 1
> sum(all.dens)
[1] 1
> # 
> (p+q)^n
[1] 1
> # note that n = whatever, (p+q)^n = 1
> 
</code>
====== Proof of Expected Value and Variance in Binomial Distribution ======
[[:Mean and Variance of Binomial Distribution|이항분포에서의 기댓값과 분산에 대한 수학적 증명]], Mathematical proof of Binomial Distribution Expected value and Variance