======= Binomial Distribution =======
- 1번의 시행에서 특정 사건 A가 발생할 확률을 p라고 하면
- n번의 (독립적인) 시행에서 사건 A가 발생할 때의 확률 분포를
- **이항확률분포**라고 한다.
아래를 보면
* 각 한문제를 맞힐 확률은 1/4, 틀릴 확률은 3/4
* 3문제를 풀면서 (3번의 시행) 각 문제를 맞힐 확률 분포를 말한다.
{{:b:head_first_statistics:pasted:20191030-035316.png}}
{{:b:head_first_statistics:pasted:20191030-035452.png}}
| x | P(X=x) | power of .75 | power of .25 |
| 0 | 0.75 * 0.75 * 0.75 | 3 | 0 |
| 1 | 3 * (0.75 * 0.75 * 0.25) | 2 | 1 |
| 2 | 3 * (0.75 * 0.25 * 0.25) | 1 | 2 |
| 3 | 0.25 * 0.25 * 0.25 | 0 | 3 |
{{:b:head_first_statistics:pasted:20191030-040346.png}}
$$P(X = r) = {\huge\text{?} \cdot 0.25^{r} \cdot 0.75^{3-r}} $$
$$P(X = r) = {\huge_{3}C_{r}} \cdot 0.25^{r} \cdot 0.75^{3-r}$$
$_{n}C_{r}$은 n개의 사물에서 r개를 (순서없이) 고르는 방법의 수라고 할 때, 3개의 질문 중에서 한 개의 정답을 맞히는 방법은 $_{3}C_{1} = 3$ 세가지가 존재.
Probability for getting one question right
\begin{eqnarray*}
P(X = r) & = & _{3}C_{1} \cdot 0.25^{1} \cdot 0.75^{3-1} \\
& = & \frac{3!}{1! \cdot (3-1)!} \cdot 0.25 \cdot 0.75^2 \\
& = & 3 \cdot 0.25 \cdot 0.5625 \\
& = & 3 \cdot 0.25 \cdot 0.5625 \\
& = & 0.421875
\end{eqnarray*}
$$P(X = r) = _{n}C_{r} \cdot 0.25^{r} \cdot 0.75^{n-r}$$
$$P(X = r) = _{n}C_{r} \cdot p^{r} \cdot q^{n-r}$$
- You’re running a series of independent trials. (n번의 시행을 하게 된다)
- There can be either a success or failure for each trial, and the probability of success is the same for each trial. (각 시행은 성공/실패로 구분되고 성공의 확률은 (반대로 실패의 확률도) 각 시행마다 동일하다)
- There are a finite number of trials. Note that this is different from that of geometric distribution. (n번의 시행으로 한정된다. 무한대 시행이 아님)
X가 n번의 시행에서 성공적인 결과를 얻는 수를 나타낸다고 할 때, r번의 성공이 있을 확률을 구하려면 아래 공식을 이용한다.
\begin{eqnarray*}
P(X = r) & = & _{n}C_{r} \cdot p^{r} \cdot q^{n-r} \;\;\; \text{Where,} \\
_{n}C_{r} & = & \frac {n!}{r!(n-r)!}
\end{eqnarray*}
p = 각 시행에서 성공할 확률
n = 시행 숫자
r = r 개의 정답을 구할 확률
$$X \sim B(n,p)$$
====== Expectation and Variance of Binomial Distribution ======
Toss a fair coin once. What is the distribution of the number of heads?
* A single trial
* The trial can be one of two possible outcomes -- success and failure
* P(success) = p
* P(failure) = 1-p
X = 0, 1 (failure and success)
$P(X=x) = p^{x}(1-p)^{1-x}$ or
$P(x) = p^{x}(1-p)^{1-x}$
참고.
| x | 0 | 1 |
| p(x) | q = (1-p) | p |
When x = 0 (failure), $P(X = 0) = p^{0}(1-p)^{1-0} = (1-p)$ = Probability of failure
When x = 1 (success), $P(X = 1) = p^{1}(1-p)^{0} = p $ = Probability of success
This is called Bernoulli distribution.
* Bernoulli distribution expands to binomial distribution, geometric distribution, etc.
* Binomial distribution = The distribution of number of success in n independent Bernoulli trials.
* Geometric distribution = The distribution of number of trials to get the first success in independent Bernoulli trials.
$$X \sim B(1,p)$$
\begin{eqnarray*}
E(X) & = & \sum{x * p(x)} \\
& = & (0*q) + (1*p) \\
& = & p
\end{eqnarray*}
\begin{eqnarray*}
Var(X) & = & E((X - E(X))^{2}) \\
& = & \sum_{x}(x-E(X))^2p(x) \ldots \ldots \ldots E(X) = p \\
& = & (0 - p)^{2}*q + (1 - p)^{2}*p \\
& = & (0^2 - 2p0 + p^2)*q + (1-2p+p^2)*p \\
& = & p^2*(1-p) + (1-2p+p^2)*p \\
& = & p^2 - p^3 + p - 2p^2 + p^3 \\
& = & p - p^2 \\
& = & p(1-p) \\
& = & pq
\end{eqnarray*}
For generalization,
$$X \sim B(n,p)$$
\begin{eqnarray*}
E(X) & = & E(X_{1}) + E(X_{2}) + ... + E(X_{n}) \\
& = & n * E(X_{i}) \\
& = & n * p
\end{eqnarray*}
\begin{eqnarray*}
Var(X) & = & Var(X_{1}) + Var(X_{2}) + ... + Var(X_{n}) \\
& = & n * Var(X_{i}) \\
& = & n * p * q
\end{eqnarray*}
====== e.g., ======
In the latest round of Who Wants To Win A Swivel Chair, there are 5 questions. The probability of
getting a successful outcome in a single trial is 0.25
- What’s the probability of getting exactly two questions right?
- What’s the probability of getting exactly three questions right?
- What’s the probability of getting two or three questions right?
- What’s the probability of getting no questions right?
- What are the expectation and variance?
Ans 1.
p <- .25
q <- 1-p
r <- 2
n <-5
# combinations of 5,2
c <- choose(n,r)
ans1 <- c*(p^r)*(q^(n-r))
ans1 # or
choose(n, r)*(p^r)*(q^(n-r))
dbinom(r, n, p)
> p <- .25
> q <- 1-p
> r <- 2
> n <-5
> # combinations of 5,2
> c <- choose(n,r)
> ans <- c*(p^r)*(q^(n-r))
> ans
[1] 0.2636719
>
> choose(n, r)*(p^r)*(q^(n-r))
[1] 0.2636719
>
> dbinom(r, n, p)
[1] 0.2636719
>
>
Ans 2.
p <- .25
q <- 1-p
r <- 3
n <-5
# combinations of 5,3
c <- choose(n,r)
ans2 <- c*(p^r)*(q^(n-r))
ans2
choose(n, r)*(p^r)*(q^(n-r))
dbinom(r, n, p)
> p <- .25
> q <- 1-p
> r <- 3
> n <-5
> # combinations of 5,3
> c <- choose(n,r)
> ans2 <- c*(p^r)*(q^(n-r))
> ans2
[1] 0.08789062
>
> choose(n,r)*(p^r)*(q^(n-r))
[1] 0.08789062
>
> dbinom(r, n, p)
[1] 0.08789063
>
>
Ans 3. 중요
ans1 + ans2
dbinom(2, 5, .25) + dbinom(3, 5, .25)
dbinom(2:3, 5, .25)
sum(dbinom(2:3, 5, .25))
pbinom(3, 5, .25) - pbinom(1, 5, .25)
> ans1 + ans2
[1] 0.3515625
> dbinom(2, 5, .25) + dbinom(3, 5, .25)
[1] 0.3515625
> dbinom(2:3, 5, .25)
[1] 0.26367187 0.08789063
> sum(dbinom(2:3, 5, .25))
[1] 0.3515625
> pbinom(3, 5, .25) - pbinom(1, 5, .25)
[1] 0.3515625
>
Ans 4.
p <- .25
q <- 1-p
r <- 0
n <-5
# combinations of 5,3
c <- choose(n,r)
ans4 <- c*(p^r)*(q^(n-r))
ans4
> p <- .25
> q <- 1-p
> r <- 0
> n <-5
> # combinations of 5,3
> c <- choose(n,r)
> ans4 <- c*(p^r)*(q^(n-r))
> ans4
[1] 0.2373047
>
Ans 5
p <- .25
q <- 1-p
n <- 5
exp.x <- n*p
exp.x
> p <- .25
> q <- 1-p
> n <- 5
> exp.x <- n*p
> exp.x
[1] 1.25
p <- .25
q <- 1-p
n <- 5
var.x <- n*p*q
var.x
> p <- .25
> q <- 1-p
> n <- 5
> var.x <- n*p*q
> var.x
[1] 0.9375
>
Q. 한 문제를 맞힐 확률은 1/4 이다. 총 여섯 문제가 있다고 할 때, 0에서 5 문제를 맞힐 확률은? dbinom을 이용해서 구하시오.
p <- 1/4
q <- 1-p
n <- 6
pbinom(5, n, p)
1 - dbinom(6, n, p)
> p <- 1/4
> q <- 1-p
> n <- 6
> pbinom(5, n, p)
[1] 0.9997559
> 1 - dbinom(6, n, p)
[1] 0.9997559
중요 . . . .
# http://commres.net/wiki/mean_and_variance_of_binomial_distribution
# ##################################################################
#
p <- 1/4
q <- 1 - p
n <- 5
r <- 0
all.dens <- dbinom(0:n, n, p)
all.dens
sum(all.dens)
choose(5,0)*p^0*(q^(5-0))
choose(5,1)*p^1*(q^(5-1))
choose(5,2)*p^2*(q^(5-2))
choose(5,3)*p^3*(q^(5-3))
choose(5,4)*p^4*(q^(5-4))
choose(5,5)*p^5*(q^(5-5))
all.dens
choose(5,0)*p^0*(q^(5-0)) +
choose(5,1)*p^1*(q^(5-1)) +
choose(5,2)*p^2*(q^(5-2)) +
choose(5,3)*p^3*(q^(5-3)) +
choose(5,4)*p^4*(q^(5-4)) +
choose(5,5)*p^5*(q^(5-5))
sum(all.dens)
#
(p+q)^n
# note that n = whatever, (p+q)^n = 1
> # http://commres.net/wiki/mean_and_variance_of_binomial_distribution
> # ##################################################################
> #
> p <- 1/4
> q <- 1 - p
> n <- 5
> r <- 0
> all.dens <- dbinom(0:n, n, p)
> all.dens
[1] 0.2373046875 0.3955078125 0.2636718750 0.0878906250
[5] 0.0146484375 0.0009765625
> sum(all.dens)
[1] 1
>
> choose(5,0)*p^0*(q^(5-0))
[1] 0.2373047
> choose(5,1)*p^1*(q^(5-1))
[1] 0.3955078
> choose(5,2)*p^2*(q^(5-2))
[1] 0.2636719
> choose(5,3)*p^3*(q^(5-3))
[1] 0.08789062
> choose(5,4)*p^4*(q^(5-4))
[1] 0.01464844
> choose(5,5)*p^5*(q^(5-5))
[1] 0.0009765625
> all.dens
[1] 0.2373046875 0.3955078125 0.2636718750 0.0878906250
[5] 0.0146484375 0.0009765625
>
> choose(5,0)*p^0*(q^(5-0)) +
+ choose(5,1)*p^1*(q^(5-1)) +
+ choose(5,2)*p^2*(q^(5-2)) +
+ choose(5,3)*p^3*(q^(5-3)) +
+ choose(5,4)*p^4*(q^(5-4)) +
+ choose(5,5)*p^5*(q^(5-5))
[1] 1
> sum(all.dens)
[1] 1
> #
> (p+q)^n
[1] 1
> # note that n = whatever, (p+q)^n = 1
>
====== Proof of Expected Value and Variance in Binomial Distribution ======
[[:Mean and Variance of Binomial Distribution|이항분포에서의 기댓값과 분산에 대한 수학적 증명]], Mathematical proof of Binomial Distribution Expected value and Variance