User Tools

Site Tools


b:head_first_statistics:estimating_populations_and_samples

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
b:head_first_statistics:estimating_populations_and_samples [2019/11/28 08:59] – [Exercise] hkimscilb:head_first_statistics:estimating_populations_and_samples [2022/11/17 12:47] (current) – [Exercise] hkimscil
Line 2: Line 2:
 {{tablelayout?colwidth="350px"&rowsHeaderSource=1&rowsVisible=2&float=right}} {{tablelayout?colwidth="350px"&rowsHeaderSource=1&rowsVisible=2&float=right}}
 |{{:b:head_first_statistics:pasted:20191125-101010.png}}| |{{:b:head_first_statistics:pasted:20191125-101010.png}}|
-So how can we use the results of the sample taste test to tell us the mean +So how can we use the results of the sample taste test to tell us the mean amount of time gumball flavor lasts for in the general gumball population?
-amount of time gumball flavor lasts for in the general gumball population?+
  
 The answer is actually pretty intuitive. We assume that the mean flavor duration of the gumballs in the sample matches that of the population. In other words, we find the mean of the sample and use it as the mean for the population too. The answer is actually pretty intuitive. We assume that the mean flavor duration of the gumballs in the sample matches that of the population. In other words, we find the mean of the sample and use it as the mean for the population too.
  
-Heres a sketch showing the distribution of the sample, and what you’d expect the distribution of the population to look like based on the sample. You’d expect the distribution of the population to be a similar shape to that of the sample, so you can assume that the mean of the sample and population have about the same value.+Here's a sketch showing the distribution of the sample, and what you’d expect the distribution of the population to look like based on the sample. You’d expect the distribution of the population to be a similar shape to that of the sample, so you can assume that the mean of the sample and population have about the same value.
  
-$$\mu \;\;\;\; \hat\mu$$+$$\mu \quad \quad \hat\mu$$
  
  
Line 18: Line 17:
 </WRAP> </WRAP>
  
-\begin{eqnarray*} +\begin{align*} 
-\overline{X} & = \frac {\sum{X}}{n} \\ +\overline{X} & = \frac {\sum{X}}{n} \\ 
-& = \frac {\sum_{i=1}^{n} X_{i}}{n} \\ +& = \frac{ \sum_{i=1}^{n} X_{i} } {n} \\ 
-& = \hat{\mu} +& = \hat{\mu} 
-\end{eqnarray*}+\end{align*}
  
 ====== Estimating population variance ====== ====== Estimating population variance ======
Line 45: Line 44:
 {{:b:head_first_statistics:pasted:20191125-103603.png}} {{:b:head_first_statistics:pasted:20191125-103603.png}}
 {{:b:head_first_statistics:pasted:20191125-103510.png}} {{:b:head_first_statistics:pasted:20191125-103510.png}}
 +
 +[[:Why N-1]]
  
 <code> <code>
Line 74: Line 75:
 | $\sum{ds^2}$ |  |  | 62.32  | | $\sum{ds^2}$ |  |  | 62.32  |
 | $n-1$ |  |  | 9  | | $n-1$ |  |  | 9  |
 +| $Var(x)$      |  |  | 6.924444   |
  
 ====== Estimating proportion ====== ====== Estimating proportion ======
Line 129: Line 131:
 \end{eqnarray*} \end{eqnarray*}
    
-이 때 각각의 시도에서의 (trial) proportion 기대값은 ($\hat{P}$): +이 때 $n = 100$일때 각각의 시도에서의 (trial) proportion 기대값은 ($\hat{P}$): 
-\begin{eqnarray+ 
-\hat{P_{1}} & = {X_{1}}/{100} = 0.\\ +\begin{align*} 
-\hat{P_{2}} & = {X_{2}}/{100} = 0.\\ +n = 100, \\ 
-\hat{P_{3}} & = {X_{3}}/{100} = 0.\\ +\hat{P_{1}} & = \frac{X_{1}}{n} = 0.34, (X_{1} = 34) \\ 
-\hat{P_{4}} & = {X_{4}}/{100} = 0.4 \\ +\hat{P_{2}} & = \frac{X_{2}}{n} = 0.43, (X_{2} = 43) \\ 
-\cdots \cdots \cdots              \\ +\hat{P_{3}} & = \frac{X_{3}}{n} = 0.32, (X_{3} = 32) \\ 
-\hat{P_{k}} & = {X_{k}}/{100} = 0. +\hat{P_{4}} & = \frac{X_{4}}{n} = 0.42, (X_{4} = 42) \\ 
-\end{eqnarray}+\cdots \cdots \cdots \\ 
 +\hat{P_{k}} & = \frac{X_{k}}{n} = 0.24, (X_{1} = 24) \\  
 +\end{align*}
  
-즉, $X \sim B(n, p)$ 일 때, sample의 확률 $P_{s} = \displaytype \frac{X}{n}$를 따른다 (X = red gumball이 나온 갯수, n = sample 크기).+즉, $X \sim B(n, p)$ 일 때, sample의 확률 $P_{s} = \dfrac{X}{n}$를 따른다 ($X= red gumball이 나온 갯수, $n= sample 크기).
 {{:b:head_first_statistics:pasted:20191126-073028.png}} {{:b:head_first_statistics:pasted:20191126-073028.png}}
  
Line 224: Line 228:
 q <- 1-p q <- 1-p
 n <- 100 n <- 100
-var <- (p*q)/(n-1+var <- (p*q)/(n) 
-se  <- sqrt((p*q)/(n-1)) +se  <- sqrt((p*q)/(n)) 
-pnorm(.395, p, se, lower.tail = F)+o <.4 
 +o.c <- .4 - (1/(2*n)) 
 +o.c  
 +pnorm(o.c, p, se, lower.tail = F)
 </code> </code>
  
 <code> <code>
 +
 > p <- 0.25 > p <- 0.25
 > q <- 1-p > q <- 1-p
 > n <- 100 > n <- 100
-> var <- (p*q)/(n-1+> var <- (p*q)/(n) 
-> se  <- sqrt((p*q)/(n-1)) +> se  <- sqrt((p*q)/(n)) 
-> pnorm(.395, p, se, lower.tail = F) +> o <.4 
-[1] 0.0004313594+> o.c <- .4 - (1/(2*n)) 
 +> o.c  
 +[1] 0.395 
 +> pnorm(o.c, p, se, lower.tail = F) 
 +[1] 0.0004060586
 </code> </code>
  
 </WRAP> </WRAP>
  
-====== How many gumballs? -- Probability of sample means ======+====== Sampling distribution of sample mean ======
  
 <WRAP info 60%> <WRAP info 60%>
Line 258: Line 270:
 \overline{X} = \frac{X_{1} + X_{2} + . . . + X_{n}}{n}  \overline{X} = \frac{X_{1} + X_{2} + . . . + X_{n}}{n} 
 \end{eqnarray*} \end{eqnarray*}
 +위는 풍선검 봉지 30개로 이루어진 샘플의 평균을 이야기하고 
 +아래는 이 평균을 계속 모았을 때의 평균을 이야기한다. 
 \begin{eqnarray*} \begin{eqnarray*}
 E(\overline{X}) & = & E\left(\frac{X_{1} + X_{2} + . . . + X_{n}}{n}\right)  \\ E(\overline{X}) & = & E\left(\frac{X_{1} + X_{2} + . . . + X_{n}}{n}\right)  \\
Line 275: Line 288:
 \end{eqnarray*} \end{eqnarray*}
  
-\begin{eqnarray*} +\begin{align*} 
-Var(\overline{X}) & = Var \left(\frac{X_{1} + X_{2} + . . . + X_{n}}{n}\right) \\ +Var(\overline{X}) & = Var \left(\frac{X_{1} + X_{2} + . . . + X_{n}}{n}\right) \\ 
-& = \frac{1}{n^2} Var \left({X_{1} + X_{2} + . . . + X_{n}\right) \\ +& = \frac {1}{n^2} Var \left(X_{1} + X_{2} + . . . + X_{n} \right) \\ 
-& = \frac{1}{n^2} (\sigma^2 + \sigma^2 + . . . + \sigma^2) \\ +& = \frac{1}{n^2} (\sigma^2 + \sigma^2 + . . . + \sigma^2) \\ 
-& = \frac{1}{n^2} n * (\sigma^2) \\ +& = \frac{1}{n^2} n * (\sigma^2) \\ 
-& = \frac{\sigma^2}{n}  +& = \frac{\sigma^2}{n}  
-\end{eqnarray*}+ 
 + 
 +\end{align*}
  
  
Line 290: Line 305:
 \end{eqnarray} \end{eqnarray}
  
-$$\text{standard error of the sample means} = \frac{\sigma}{\sqrt{n}}$$+\begin{eqnarray*} 
 +\text{standard error} & = & \text{standard deviation of sample means} \\ 
 +\frac{\sigma}{\sqrt{n}} \\ 
 +& = & \sqrt{\frac{\sigma^{2}}{n}}   
 +\end{eqnarray*}
  
 {{:b:head_first_statistics:pasted:20191126-093924.png}} {{:b:head_first_statistics:pasted:20191126-093924.png}}
Line 307: Line 326:
  
 ===== Using CLT for the binomial distribution ===== ===== Using CLT for the binomial distribution =====
-$X \sim B(n, p)$, n이 30이 넘는 조건에서$\mu = np$, $\sigma^2 = npq$ 이므로 이를 $\overline{X} \sim N(\mu, \frac{\sigma^2}{n})$에 대입해 보면: +$X \sim B(n, p)$ 에서 $\mu = np$, $\sigma^2 = npq$ 이고, 
 +n이 30이 넘는 조건에서 이항분포가 정상분포를 이룬다고 하므로   
 +$\overline{X} \sim N(\mu, \frac{\sigma^2}{n})$에 대입해 보면: 
 $$\overline{X} \sim N(np, \; pq) $$ $$\overline{X} \sim N(np, \; pq) $$
  
Line 343: Line 364:
 [1] 1.053435e-16 [1] 1.053435e-16
  
 +</code>
 +discrepancy?
 +<code>
 +> a <- sqrt(1/30)
 +> b <- 8.5-10
 +> b/a
 +[1] -8.215838
 +> pnorm(b/a)
 +[1] 1.053435e-16
 +
 </code> </code>
  
b/head_first_statistics/estimating_populations_and_samples.1574899188.txt.gz · Last modified: 2019/11/28 08:59 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki