Table of Contents

Geometric Distribution

기하분포
\begin{align*} \text{Geometric Distribution: } \;\;\; \text{X} & \thicksim Geo(p) \\ p(X = k) & = q^{k-1} \cdot p \\ E\left[ X \right] & = \frac{1}{p} \\ V\left[ X \right] & = \frac{q}{p^2} \\ \\ \end{align*}

Geometric Distributions

The probability of Chad making a clear run down the slope is 0.2, and he's going to keep on trying until he succeeds. After he’s made his first successful run down the slopes, he’s going to stop snowboarding, and head back to the lodge triumphantly

It’s time to exercise your probability skills. The probability of Chad making a successful run down the slopes is 0.2 for any given trial (assume trials are independent). What’s the probability he’ll need two trials? What’s the probability he’ll make a successful run down the slope in one or two trials? Remember, when he’s had his first successful run, he’s going to stop.

Hint: You may want to draw a probability tree to help visualize the problem.

P(X = 1) = P(success in the first trial) = 0.2
P(X = 2) = P(success in the second trial union failure in the first trial) = 0.8 * 0.2 = 0.16
1회 혹은 2회에서 성공할 확률
P(X <= 2) = P(X = 1) + P(X = 2) = 0.2 + 0.16 = 0.36

X P(X=x)
1 0.2
2 0.8 * 0.2 = 0.16
3 0.8 * 0.8 * 0.2 = 0.128
4 0.8 * 0.8 * 0.8 * 0.2 = 0.1024
. . . . . . . .
X P(X=x) Power of 0.8 Power of 0.2
1 0.80 * 0.2 0 1
2 0.81 * 0.2 1 1
3 0.82 * 0.2 2 1
4 0.83 * 0.2 3 1
5 0.84 * 0.2 4 1
r . . . . . r - 1 1

$P(X = r) = 0.8^{r-1} × 0.2$
$P(X = r) = q^{r-1} × p $

This formula is called the geometric distribution.

$ P(X=r) = {p \cdot q^{r-1}} $
$ P(X=r) = {p \cdot (1-p)^{r-1}} $

p = 0.20
n = 29
## geometric . . . . 
## note that it starts with 0 rather than 1
## since the function uses p * q^(r), 
## rather than p * q^(r-1)
dgeom(x = 0:n, prob = p)
hist(dgeom(x = 0:n, prob = p))
> p = 0.20
> n = 29
> # exact
> dgeom(0:n, prob = p)
 [1] 0.2000000000 0.1600000000 0.1280000000 0.1024000000 0.0819200000 0.0655360000 0.0524288000
 [8] 0.0419430400 0.0335544320 0.0268435456 0.0214748365 0.0171798692 0.0137438953 0.0109951163
[15] 0.0087960930 0.0070368744 0.0056294995 0.0045035996 0.0036028797 0.0028823038 0.0023058430
[22] 0.0018446744 0.0014757395 0.0011805916 0.0009444733 0.0007555786 0.0006044629 0.0004835703
[29] 0.0003868563 0.0003094850 
> 
> hist(dgeom(x = 0:n, prob = p))

r번 시도한 이후, 그 이후 어디서든지 간에 성공을 얻을 확률
$$ P(X > r) = q^{r} $$

예, 20번 시도 후에 어디선가 성공할 확률은?

Solution.

p <- .2
q <- 1-p
n <- 19
s <- dgeom(x = 0:n, prob = p)
# 20번째까지 성공할 확률을 모두 더한 확률
sum(s)
# 따라서 아래는 20번 이후 어디서든지 간에서 성공할 확률
1-sum(s)
## 혹은 (교재가 이야기하는) 20번까지 실패하는 확률
q^20
> p <- .2
> q <- 1-p
> n <- 19
> s <- dgeom(x = 0:n, prob = p)
> # 20번째까지 성공할 확률
> sum(s)
[1] 0.9884708
> # 따라서 아래는 20번 이후 어디서든지 간에서 성공할 확률
> 1-sum(s)
[1] 0.01152922
> ## 혹은 (교재가 이야기하는) 20번까지 실패하는 확률
> q^20
[1] 0.01152922
> 

그렇다면
r 번 이전에 성공이 있을 확률은? = r 번까지의 실패할 확률의 보수
$$ P(X \le r) = 1 - q^{r} $$

혹은 1번째 성공 + 2번째 성공 + . . . + r 번째 성공으로 구해도 된다

# r = 20 이라고 하면 
p <- .2
q <- 1-p
n <- 19
s <- dgeom(x = 0:n, prob = p)
sum(s)

Note that
$$P(X > r) + P(X \le r) = 1 $$

Expected value

X가 성공할 확률 p를 가진 Geometric distribution을 따른다 :: $X \sim \text{Geo}(p)$

Reminding . . . Expected value in discrete probability
$E(X) = \sum x*P(X=x)$

textbook x P(X = x) xP(X = x) xP(X ≤ x):
$E(X) = \sum (x*P(X=x))$
r code trial px ← q^(trial-1)*p npx ← trial*(q^(trial-1))*p plex ← cumsum(trial*(q^(trial-1))*p)
px npx ← trial*px plex ← cumsum(npx)
x번째 (trial번째)
성공할 확률
x번째의 기대치
(주사위 경우처럼)
그 x번째까지 성공할
확률에 대한 기대값
x p(x) px npx.0 npx = weighted
probability at
a given spot
plex.0 plex
0 0.1 0 * 0.1 0.00 0.00 0.00
1 0.15 1 * 0.15 0.15 0.00 + 0.15 0.15
2 0.4 2 * 0.4 0.80 0.00 + 0.15 + 0.80 0.95
3 0.25 3 * 0.25 0.75 0.00 + 0.15 + 0.80 + 0.75 1.7
4 0.1 4 * 0.1 0.40 0.00 + 0.15 + 0.80 + 0.75 + 0.40 2.1 = this is E(x)
p <- .2
q <- 1-p
trial <- c(1:8)
px <- q^(trial-1)*p
px
## npx <- trial*(q^(trial-1))*p
## 위는 아래와 같음
npx <- trial*px
npx
## plex <- cumsum(trial*(q^(trial-1))*p)
## 위는 아래와 같음
plex <- cumsum(npx)
plex
sumgeod <- data.frame(trial,px,npx,plex)
round(sumgeod,3)
> p <- .2
> q <- 1-p
> trial <- c(1,2,3,4,5,6,7,8)
> px <- q^(trial-1)*p
> px
[1] 0.20000000 0.16000000 0.12800000 0.10240000 0.08192000 0.06553600 0.05242880 0.04194304
> npx <- trial*(q^(trial-1))*p
> npx
[1] 0.2000000 0.3200000 0.3840000 0.4096000 0.4096000 0.3932160 0.3670016 0.3355443
> plex <- cumsum(trial*(q^(trial-1))*p)
> plex
[1] 0.200000 0.520000 0.904000 1.313600 1.723200 2.116416 2.483418 2.818962
> sumgeod <- data.frame(trial,px,npx,plex)
> round(sumgeod,3)
  trial    px   npx  plex
1     1 0.200 0.200 0.200
2     2 0.160 0.320 0.520
3     3 0.128 0.384 0.904
4     4 0.102 0.410 1.314
5     5 0.082 0.410 1.723
6     6 0.066 0.393 2.116
7     7 0.052 0.367 2.483
8     8 0.042 0.336 2.819
> 
p <- .2
q <- 1-p
trial <- c(1:100)
px <- q^(trial-1)*p
px
npx <- trial*px
npx
## plex <- cumsum(trial*(q^(trial-1))*p)
## 위는 아래와 같음
plex <- cumsum(npx)
plex
sumgeod <- data.frame(trial,px,npx,plex)
sumgeod 

plot(npx, type="l")
plot(plex, type="l")
> 
> p <- .2
> q <- 1-p
> trial <- c(1:100)
> px <- q^(trial-1)*p
> px
  [1] 2.000000e-01 1.600000e-01 1.280000e-01 1.024000e-01
  [5] 8.192000e-02 6.553600e-02 5.242880e-02 4.194304e-02
  [9] 3.355443e-02 2.684355e-02 2.147484e-02 1.717987e-02
 [13] 1.374390e-02 1.099512e-02 8.796093e-03 7.036874e-03
 [17] 5.629500e-03 4.503600e-03 3.602880e-03 2.882304e-03
 [21] 2.305843e-03 1.844674e-03 1.475740e-03 1.180592e-03
 [25] 9.444733e-04 7.555786e-04 6.044629e-04 4.835703e-04
 [29] 3.868563e-04 3.094850e-04 2.475880e-04 1.980704e-04
 [33] 1.584563e-04 1.267651e-04 1.014120e-04 8.112964e-05
 [37] 6.490371e-05 5.192297e-05 4.153837e-05 3.323070e-05
 [41] 2.658456e-05 2.126765e-05 1.701412e-05 1.361129e-05
 [45] 1.088904e-05 8.711229e-06 6.968983e-06 5.575186e-06
 [49] 4.460149e-06 3.568119e-06 2.854495e-06 2.283596e-06
 [53] 1.826877e-06 1.461502e-06 1.169201e-06 9.353610e-07
 [57] 7.482888e-07 5.986311e-07 4.789049e-07 3.831239e-07
 [61] 3.064991e-07 2.451993e-07 1.961594e-07 1.569275e-07
 [65] 1.255420e-07 1.004336e-07 8.034690e-08 6.427752e-08
 [69] 5.142202e-08 4.113761e-08 3.291009e-08 2.632807e-08
 [73] 2.106246e-08 1.684997e-08 1.347997e-08 1.078398e-08
 [77] 8.627183e-09 6.901746e-09 5.521397e-09 4.417118e-09
 [81] 3.533694e-09 2.826955e-09 2.261564e-09 1.809251e-09
 [85] 1.447401e-09 1.157921e-09 9.263367e-10 7.410694e-10
 [89] 5.928555e-10 4.742844e-10 3.794275e-10 3.035420e-10
 [93] 2.428336e-10 1.942669e-10 1.554135e-10 1.243308e-10
 [97] 9.946465e-11 7.957172e-11 6.365737e-11 5.092590e-11
> npx <- trial*px
> npx
  [1] 2.000000e-01 3.200000e-01 3.840000e-01 4.096000e-01
  [5] 4.096000e-01 3.932160e-01 3.670016e-01 3.355443e-01
  [9] 3.019899e-01 2.684355e-01 2.362232e-01 2.061584e-01
 [13] 1.786706e-01 1.539316e-01 1.319414e-01 1.125900e-01
 [17] 9.570149e-02 8.106479e-02 6.845471e-02 5.764608e-02
 [21] 4.842270e-02 4.058284e-02 3.394201e-02 2.833420e-02
 [25] 2.361183e-02 1.964504e-02 1.632050e-02 1.353997e-02
 [29] 1.121883e-02 9.284550e-03 7.675228e-03 6.338253e-03
 [33] 5.229059e-03 4.310012e-03 3.549422e-03 2.920667e-03
 [37] 2.401437e-03 1.973073e-03 1.619997e-03 1.329228e-03
 [41] 1.089967e-03 8.932412e-04 7.316071e-04 5.988970e-04
 [45] 4.900066e-04 4.007165e-04 3.275422e-04 2.676089e-04
 [49] 2.185473e-04 1.784060e-04 1.455793e-04 1.187470e-04
 [53] 9.682448e-05 7.892109e-05 6.430607e-05 5.238022e-05
 [57] 4.265246e-05 3.472060e-05 2.825539e-05 2.298743e-05
 [61] 1.869645e-05 1.520236e-05 1.235804e-05 1.004336e-05
 [65] 8.160232e-06 6.628619e-06 5.383242e-06 4.370871e-06
 [69] 3.548119e-06 2.879633e-06 2.336616e-06 1.895621e-06
 [73] 1.537559e-06 1.246898e-06 1.010998e-06 8.195824e-07
 [77] 6.642931e-07 5.383362e-07 4.361904e-07 3.533694e-07
 [81] 2.862292e-07 2.318103e-07 1.877098e-07 1.519771e-07
 [85] 1.230291e-07 9.958120e-08 8.059129e-08 6.521410e-08
 [89] 5.276414e-08 4.268560e-08 3.452790e-08 2.792587e-08
 [93] 2.258353e-08 1.826109e-08 1.476428e-08 1.193576e-08
 [97] 9.648071e-09 7.798028e-09 6.302080e-09 5.092590e-09
> ## plex <- cumsum(trial*(q^(trial-1))*p)
> ## 위는 아래와 같음
> plex <- cumsum(npx)
> plex
  [1] 0.200000 0.520000 0.904000 1.313600 1.723200 2.116416 2.483418
  [8] 2.818962 3.120952 3.389387 3.625610 3.831769 4.010440 4.164371
 [15] 4.296313 4.408903 4.504604 4.585669 4.654124 4.711770 4.760192
 [22] 4.800775 4.834717 4.863051 4.886663 4.906308 4.922629 4.936169
 [29] 4.947388 4.956672 4.964347 4.970686 4.975915 4.980225 4.983774
 [36] 4.986695 4.989096 4.991069 4.992689 4.994018 4.995108 4.996002
 [43] 4.996733 4.997332 4.997822 4.998223 4.998550 4.998818 4.999037
 [50] 4.999215 4.999361 4.999479 4.999576 4.999655 4.999719 4.999772
 [57] 4.999814 4.999849 4.999877 4.999900 4.999919 4.999934 4.999947
 [64] 4.999957 4.999965 4.999971 4.999977 4.999981 4.999985 4.999988
 [71] 4.999990 4.999992 4.999993 4.999995 4.999996 4.999997 4.999997
 [78] 4.999998 4.999998 4.999998 4.999999 4.999999 4.999999 4.999999
 [85] 4.999999 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
 [92] 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
 [99] 5.000000 5.000000
> sumgeod <- data.frame(trial,px,npx,plex)
> sumgeod 
    trial           px          npx     plex
1       1 2.000000e-01 2.000000e-01 0.200000
2       2 1.600000e-01 3.200000e-01 0.520000
3       3 1.280000e-01 3.840000e-01 0.904000
4       4 1.024000e-01 4.096000e-01 1.313600
5       5 8.192000e-02 4.096000e-01 1.723200
6       6 6.553600e-02 3.932160e-01 2.116416
7       7 5.242880e-02 3.670016e-01 2.483418
8       8 4.194304e-02 3.355443e-01 2.818962
9       9 3.355443e-02 3.019899e-01 3.120952
10     10 2.684355e-02 2.684355e-01 3.389387
11     11 2.147484e-02 2.362232e-01 3.625610
12     12 1.717987e-02 2.061584e-01 3.831769
13     13 1.374390e-02 1.786706e-01 4.010440
14     14 1.099512e-02 1.539316e-01 4.164371
15     15 8.796093e-03 1.319414e-01 4.296313
16     16 7.036874e-03 1.125900e-01 4.408903
17     17 5.629500e-03 9.570149e-02 4.504604
18     18 4.503600e-03 8.106479e-02 4.585669
19     19 3.602880e-03 6.845471e-02 4.654124
20     20 2.882304e-03 5.764608e-02 4.711770
21     21 2.305843e-03 4.842270e-02 4.760192
22     22 1.844674e-03 4.058284e-02 4.800775
23     23 1.475740e-03 3.394201e-02 4.834717
24     24 1.180592e-03 2.833420e-02 4.863051
25     25 9.444733e-04 2.361183e-02 4.886663
26     26 7.555786e-04 1.964504e-02 4.906308
27     27 6.044629e-04 1.632050e-02 4.922629
28     28 4.835703e-04 1.353997e-02 4.936169
29     29 3.868563e-04 1.121883e-02 4.947388
30     30 3.094850e-04 9.284550e-03 4.956672
31     31 2.475880e-04 7.675228e-03 4.964347
32     32 1.980704e-04 6.338253e-03 4.970686
33     33 1.584563e-04 5.229059e-03 4.975915
34     34 1.267651e-04 4.310012e-03 4.980225
35     35 1.014120e-04 3.549422e-03 4.983774
36     36 8.112964e-05 2.920667e-03 4.986695
37     37 6.490371e-05 2.401437e-03 4.989096
38     38 5.192297e-05 1.973073e-03 4.991069
39     39 4.153837e-05 1.619997e-03 4.992689
40     40 3.323070e-05 1.329228e-03 4.994018
41     41 2.658456e-05 1.089967e-03 4.995108
42     42 2.126765e-05 8.932412e-04 4.996002
43     43 1.701412e-05 7.316071e-04 4.996733
44     44 1.361129e-05 5.988970e-04 4.997332
45     45 1.088904e-05 4.900066e-04 4.997822
46     46 8.711229e-06 4.007165e-04 4.998223
47     47 6.968983e-06 3.275422e-04 4.998550
48     48 5.575186e-06 2.676089e-04 4.998818
49     49 4.460149e-06 2.185473e-04 4.999037
50     50 3.568119e-06 1.784060e-04 4.999215
51     51 2.854495e-06 1.455793e-04 4.999361
52     52 2.283596e-06 1.187470e-04 4.999479
53     53 1.826877e-06 9.682448e-05 4.999576
54     54 1.461502e-06 7.892109e-05 4.999655
55     55 1.169201e-06 6.430607e-05 4.999719
56     56 9.353610e-07 5.238022e-05 4.999772
57     57 7.482888e-07 4.265246e-05 4.999814
58     58 5.986311e-07 3.472060e-05 4.999849
59     59 4.789049e-07 2.825539e-05 4.999877
60     60 3.831239e-07 2.298743e-05 4.999900
61     61 3.064991e-07 1.869645e-05 4.999919
62     62 2.451993e-07 1.520236e-05 4.999934
63     63 1.961594e-07 1.235804e-05 4.999947
64     64 1.569275e-07 1.004336e-05 4.999957
65     65 1.255420e-07 8.160232e-06 4.999965
66     66 1.004336e-07 6.628619e-06 4.999971
67     67 8.034690e-08 5.383242e-06 4.999977
68     68 6.427752e-08 4.370871e-06 4.999981
69     69 5.142202e-08 3.548119e-06 4.999985
70     70 4.113761e-08 2.879633e-06 4.999988
71     71 3.291009e-08 2.336616e-06 4.999990
72     72 2.632807e-08 1.895621e-06 4.999992
73     73 2.106246e-08 1.537559e-06 4.999993
74     74 1.684997e-08 1.246898e-06 4.999995
75     75 1.347997e-08 1.010998e-06 4.999996
76     76 1.078398e-08 8.195824e-07 4.999997
77     77 8.627183e-09 6.642931e-07 4.999997
78     78 6.901746e-09 5.383362e-07 4.999998
79     79 5.521397e-09 4.361904e-07 4.999998
80     80 4.417118e-09 3.533694e-07 4.999998
81     81 3.533694e-09 2.862292e-07 4.999999
82     82 2.826955e-09 2.318103e-07 4.999999
83     83 2.261564e-09 1.877098e-07 4.999999
84     84 1.809251e-09 1.519771e-07 4.999999
85     85 1.447401e-09 1.230291e-07 4.999999
86     86 1.157921e-09 9.958120e-08 5.000000 ########### 
87     87 9.263367e-10 8.059129e-08 5.000000
88     88 7.410694e-10 6.521410e-08 5.000000
89     89 5.928555e-10 5.276414e-08 5.000000
90     90 4.742844e-10 4.268560e-08 5.000000
91     91 3.794275e-10 3.452790e-08 5.000000
92     92 3.035420e-10 2.792587e-08 5.000000
93     93 2.428336e-10 2.258353e-08 5.000000
94     94 1.942669e-10 1.826109e-08 5.000000
95     95 1.554135e-10 1.476428e-08 5.000000
96     96 1.243308e-10 1.193576e-08 5.000000
97     97 9.946465e-11 9.648071e-09 5.000000
98     98 7.957172e-11 7.798028e-09 5.000000
99     99 6.365737e-11 6.302080e-09 5.000000
100   100 5.092590e-11 5.092590e-09 5.000000
> plot(npx, type="l")
> plot(plex, type="l")


Proof of mean and variance of geometric distribution

$(4)$, $(5)$에 대한 증명은 Mean and Variance of Geometric Distribution

e.g.,

The probability that another snowboarder will make it down the slope without falling over is 0.4. Your job is to play like you’re the snowboarder and work out the following probabilities for your slope success.

  1. The probability that you will be successful on your second attempt, while failing on your first.
  2. The probability that you will be successful in 4 attempts or fewer.
  3. The probability that you will need more than 4 attempts to be successful.
  4. The number of attempts you expect you’ll need to make before being successful.
  5. The variance of the number of attempts.
  1. $P(X = 2) = p * q^{2-1}$
  2. $P(X \le 4) = 1 - q^{4}$
  3. $P(X > 4) = q^{4}$
  4. $E(X) = \displaystyle \frac{1}{p}$
  5. $Var(X) = \displaystyle \frac{q}{p^{2}}$