======= Binomial Distribution ======= - 1번의 시행에서 특정 사건 A가 발생할 확률을 p라고 하면 - n번의 (독립적인) 시행에서 사건 A가 발생할 때의 확률 분포를 - **이항확률분포**라고 한다. 아래를 보면 * 각 한문제를 맞힐 확률은 1/4, 틀릴 확률은 3/4 * 3문제를 풀면서 (3번의 시행) 각 문제를 맞힐 확률 분포를 말한다. {{:b:head_first_statistics:pasted:20191030-035316.png}} {{:b:head_first_statistics:pasted:20191030-035452.png}} | x | P(X=x) | power of .75 | power of .25 | | 0 | 0.75 * 0.75 * 0.75 | 3 | 0 | | 1 | 3 * (0.75 * 0.75 * 0.25) | 2 | 1 | | 2 | 3 * (0.75 * 0.25 * 0.25) | 1 | 2 | | 3 | 0.25 * 0.25 * 0.25 | 0 | 3 | {{:b:head_first_statistics:pasted:20191030-040346.png}} $$P(X = r) = {\huge\text{?} \cdot 0.25^{r} \cdot 0.75^{3-r}} $$ $$P(X = r) = {\huge_{3}C_{r}} \cdot 0.25^{r} \cdot 0.75^{3-r}$$ $_{n}C_{r}$은 n개의 사물에서 r개를 (순서없이) 고르는 방법의 수라고 할 때, 3개의 질문 중에서 한 개의 정답을 맞히는 방법은 $_{3}C_{1} = 3$ 세가지가 존재. Probability for getting one question right \begin{eqnarray*} P(X = r) & = & _{3}C_{1} \cdot 0.25^{1} \cdot 0.75^{3-1} \\ & = & \frac{3!}{1! \cdot (3-1)!} \cdot 0.25 \cdot 0.75^2 \\ & = & 3 \cdot 0.25 \cdot 0.5625 \\ & = & 3 \cdot 0.25 \cdot 0.5625 \\ & = & 0.421875 \end{eqnarray*} $$P(X = r) = _{n}C_{r} \cdot 0.25^{r} \cdot 0.75^{n-r}$$ $$P(X = r) = _{n}C_{r} \cdot p^{r} \cdot q^{n-r}$$ - You’re running a series of independent trials. (n번의 시행을 하게 된다) - There can be either a success or failure for each trial, and the probability of success is the same for each trial. (각 시행은 성공/실패로 구분되고 성공의 확률은 (반대로 실패의 확률도) 각 시행마다 동일하다) - There are a finite number of trials. Note that this is different from that of geometric distribution. (n번의 시행으로 한정된다. 무한대 시행이 아님) X가 n번의 시행에서 성공적인 결과를 얻는 수를 나타낸다고 할 때, r번의 성공이 있을 확률을 구하려면 아래 공식을 이용한다. \begin{eqnarray*} P(X = r) & = & _{n}C_{r} \cdot p^{r} \cdot q^{n-r} \;\;\; \text{Where,} \\ _{n}C_{r} & = & \frac {n!}{r!(n-r)!} \end{eqnarray*} p = 각 시행에서 성공할 확률 n = 시행 숫자 r = r 개의 정답을 구할 확률 $$X \sim B(n,p)$$ ====== Expectation and Variance of Binomial Distribution ====== Toss a fair coin once. What is the distribution of the number of heads? * A single trial * The trial can be one of two possible outcomes -- success and failure * P(success) = p * P(failure) = 1-p X = 0, 1 (failure and success) $P(X=x) = p^{x}(1-p)^{1-x}$ or $P(x) = p^{x}(1-p)^{1-x}$ 참고. | x | 0 | 1 | | p(x) | q = (1-p) | p | When x = 0 (failure), $P(X = 0) = p^{0}(1-p)^{1-0} = (1-p)$ = Probability of failure When x = 1 (success), $P(X = 1) = p^{1}(1-p)^{0} = p $ = Probability of success This is called Bernoulli distribution. * Bernoulli distribution expands to binomial distribution, geometric distribution, etc. * Binomial distribution = The distribution of number of success in n independent Bernoulli trials. * Geometric distribution = The distribution of number of trials to get the first success in independent Bernoulli trials. $$X \sim B(1,p)$$ \begin{eqnarray*} E(X) & = & \sum{x * p(x)} \\ & = & (0*q) + (1*p) \\ & = & p \end{eqnarray*} \begin{eqnarray*} Var(X) & = & E((X - E(X))^{2}) \\ & = & \sum_{x}(x-E(X))^2p(x) \ldots \ldots \ldots E(X) = p \\ & = & (0 - p)^{2}*q + (1 - p)^{2}*p \\ & = & (0^2 - 2p0 + p^2)*q + (1-2p+p^2)*p \\ & = & p^2*(1-p) + (1-2p+p^2)*p \\ & = & p^2 - p^3 + p - 2p^2 + p^3 \\ & = & p - p^2 \\ & = & p(1-p) \\ & = & pq \end{eqnarray*} For generalization, $$X \sim B(n,p)$$ \begin{eqnarray*} E(X) & = & E(X_{1}) + E(X_{2}) + ... + E(X_{n}) \\ & = & n * E(X_{i}) \\ & = & n * p \end{eqnarray*} \begin{eqnarray*} Var(X) & = & Var(X_{1}) + Var(X_{2}) + ... + Var(X_{n}) \\ & = & n * Var(X_{i}) \\ & = & n * p * q \end{eqnarray*} ====== e.g., ====== In the latest round of Who Wants To Win A Swivel Chair, there are 5 questions. The probability of getting a successful outcome in a single trial is 0.25 - What’s the probability of getting exactly two questions right? - What’s the probability of getting exactly three questions right? - What’s the probability of getting two or three questions right? - What’s the probability of getting no questions right? - What are the expectation and variance? Ans 1. p <- .25 q <- 1-p r <- 2 n <-5 # combinations of 5,2 c <- choose(n,r) ans1 <- c*(p^r)*(q^(n-r)) ans1 # or choose(n, r)*(p^r)*(q^(n-r)) dbinom(r, n, p) > p <- .25 > q <- 1-p > r <- 2 > n <-5 > # combinations of 5,2 > c <- choose(n,r) > ans <- c*(p^r)*(q^(n-r)) > ans [1] 0.2636719 > > choose(n, r)*(p^r)*(q^(n-r)) [1] 0.2636719 > > dbinom(r, n, p) [1] 0.2636719 > > Ans 2. p <- .25 q <- 1-p r <- 3 n <-5 # combinations of 5,3 c <- choose(n,r) ans2 <- c*(p^r)*(q^(n-r)) ans2 choose(n, r)*(p^r)*(q^(n-r)) dbinom(r, n, p) > p <- .25 > q <- 1-p > r <- 3 > n <-5 > # combinations of 5,3 > c <- choose(n,r) > ans2 <- c*(p^r)*(q^(n-r)) > ans2 [1] 0.08789062 > > choose(n,r)*(p^r)*(q^(n-r)) [1] 0.08789062 > > dbinom(r, n, p) [1] 0.08789063 > > Ans 3. 중요 ans1 + ans2 dbinom(2, 5, .25) + dbinom(3, 5, .25) dbinom(2:3, 5, .25) sum(dbinom(2:3, 5, .25)) pbinom(3, 5, .25) - pbinom(1, 5, .25) > ans1 + ans2 [1] 0.3515625 > dbinom(2, 5, .25) + dbinom(3, 5, .25) [1] 0.3515625 > dbinom(2:3, 5, .25) [1] 0.26367187 0.08789063 > sum(dbinom(2:3, 5, .25)) [1] 0.3515625 > pbinom(3, 5, .25) - pbinom(1, 5, .25) [1] 0.3515625 > Ans 4. p <- .25 q <- 1-p r <- 0 n <-5 # combinations of 5,3 c <- choose(n,r) ans4 <- c*(p^r)*(q^(n-r)) ans4 > p <- .25 > q <- 1-p > r <- 0 > n <-5 > # combinations of 5,3 > c <- choose(n,r) > ans4 <- c*(p^r)*(q^(n-r)) > ans4 [1] 0.2373047 > Ans 5 p <- .25 q <- 1-p n <- 5 exp.x <- n*p exp.x > p <- .25 > q <- 1-p > n <- 5 > exp.x <- n*p > exp.x [1] 1.25 p <- .25 q <- 1-p n <- 5 var.x <- n*p*q var.x > p <- .25 > q <- 1-p > n <- 5 > var.x <- n*p*q > var.x [1] 0.9375 > Q. 한 문제를 맞힐 확률은 1/4 이다. 총 여섯 문제가 있다고 할 때, 0에서 5 문제를 맞힐 확률은? dbinom을 이용해서 구하시오. p <- 1/4 q <- 1-p n <- 6 pbinom(5, n, p) 1 - dbinom(6, n, p) > p <- 1/4 > q <- 1-p > n <- 6 > pbinom(5, n, p) [1] 0.9997559 > 1 - dbinom(6, n, p) [1] 0.9997559 중요 . . . . # http://commres.net/wiki/mean_and_variance_of_binomial_distribution # ################################################################## # p <- 1/4 q <- 1 - p n <- 5 r <- 0 all.dens <- dbinom(0:n, n, p) all.dens sum(all.dens) choose(5,0)*p^0*(q^(5-0)) choose(5,1)*p^1*(q^(5-1)) choose(5,2)*p^2*(q^(5-2)) choose(5,3)*p^3*(q^(5-3)) choose(5,4)*p^4*(q^(5-4)) choose(5,5)*p^5*(q^(5-5)) all.dens choose(5,0)*p^0*(q^(5-0)) + choose(5,1)*p^1*(q^(5-1)) + choose(5,2)*p^2*(q^(5-2)) + choose(5,3)*p^3*(q^(5-3)) + choose(5,4)*p^4*(q^(5-4)) + choose(5,5)*p^5*(q^(5-5)) sum(all.dens) # (p+q)^n # note that n = whatever, (p+q)^n = 1 > # http://commres.net/wiki/mean_and_variance_of_binomial_distribution > # ################################################################## > # > p <- 1/4 > q <- 1 - p > n <- 5 > r <- 0 > all.dens <- dbinom(0:n, n, p) > all.dens [1] 0.2373046875 0.3955078125 0.2636718750 0.0878906250 [5] 0.0146484375 0.0009765625 > sum(all.dens) [1] 1 > > choose(5,0)*p^0*(q^(5-0)) [1] 0.2373047 > choose(5,1)*p^1*(q^(5-1)) [1] 0.3955078 > choose(5,2)*p^2*(q^(5-2)) [1] 0.2636719 > choose(5,3)*p^3*(q^(5-3)) [1] 0.08789062 > choose(5,4)*p^4*(q^(5-4)) [1] 0.01464844 > choose(5,5)*p^5*(q^(5-5)) [1] 0.0009765625 > all.dens [1] 0.2373046875 0.3955078125 0.2636718750 0.0878906250 [5] 0.0146484375 0.0009765625 > > choose(5,0)*p^0*(q^(5-0)) + + choose(5,1)*p^1*(q^(5-1)) + + choose(5,2)*p^2*(q^(5-2)) + + choose(5,3)*p^3*(q^(5-3)) + + choose(5,4)*p^4*(q^(5-4)) + + choose(5,5)*p^5*(q^(5-5)) [1] 1 > sum(all.dens) [1] 1 > # > (p+q)^n [1] 1 > # note that n = whatever, (p+q)^n = 1 > ====== Proof of Expected Value and Variance in Binomial Distribution ====== [[:Mean and Variance of Binomial Distribution|이항분포에서의 기댓값과 분산에 대한 수학적 증명]], Mathematical proof of Binomial Distribution Expected value and Variance