b:head_first_statistics:estimating_populations_and_samples
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
b:head_first_statistics:estimating_populations_and_samples [2024/11/06 08:23] – [What about variance] hkimscil | b:head_first_statistics:estimating_populations_and_samples [2025/10/08 12:20] (current) – [Sampling distribution of sample mean] hkimscil | ||
---|---|---|---|
Line 11: | Line 11: | ||
- | < | + | < |
$\hat\mu$ : See this hat I’m wearing? It means I’m a point estimator. If you don’t have the exact value of the mean, then I'm the next best thing. | $\hat\mu$ : See this hat I’m wearing? It means I’m a point estimator. If you don’t have the exact value of the mean, then I'm the next best thing. | ||
Line 88: | Line 88: | ||
p = 32/40 = 0.8 | p = 32/40 = 0.8 | ||
- | < | + | < |
Mighty Gumball takes another sample of their super-long-lasting gumballs, and finds that in the sample, 10 out of 40 people prefer the pink gumballs to all other colors. What proportion of people prefer pink gumballs in the population? What’s the probability of choosing someone from the population who doesn’t prefer pink gumballs? | Mighty Gumball takes another sample of their super-long-lasting gumballs, and finds that in the sample, 10 out of 40 people prefer the pink gumballs to all other colors. What proportion of people prefer pink gumballs in the population? What’s the probability of choosing someone from the population who doesn’t prefer pink gumballs? | ||
</ | </ | ||
Line 366: | Line 366: | ||
continuity correction: $$\pm \frac{1}{2n}$$ | continuity correction: $$\pm \frac{1}{2n}$$ | ||
+ | |||
+ | R에서의 simulation을 계속해서 보면 | ||
+ | < | ||
+ | > # variance? | ||
+ | > var.cal <- var(ps.k) | ||
+ | > var.value <- (p*q)/n | ||
+ | > var.cal | ||
+ | [1] 0.001869001 | ||
+ | > var.value | ||
+ | [1] 0.001875 | ||
+ | > | ||
+ | > # standard deviation | ||
+ | > sd.cal <- sqrt(var.cal) | ||
+ | > sd.value <- sqrt(var.value) | ||
+ | > sd.cal | ||
+ | [1] 0.04323195 | ||
+ | > sd.value | ||
+ | [1] 0.04330127 | ||
+ | > se <- sd.value | ||
+ | > # 우리는 standard deviation of sample | ||
+ | > # proportions 를 standard error라고 | ||
+ | > # 부른다 | ||
+ | > | ||
+ | </ | ||
+ | 위의 se는 standard deviation의 일종이므로 그 특성을 갖는다 (68, 95, 99%). 따라서 Red gumball의 비율이 1/4임을 알고 있을 때, n=100개의 gumball을 샘플링하면 (한번), red gumball의 비율은 p를 (0.25) 중심으로 위아래도 2*se 범위의 값이 나올 확률이 95%임을 안다는 것이 된다. 위에서 계산해보면; | ||
+ | |||
+ | < | ||
+ | # 위의 histogram 에서 mean 값은 이론적으로 | ||
+ | p | ||
+ | # standard deviation값은 | ||
+ | se | ||
+ | |||
+ | # 우리는 평균값에서 +- 2*sd.cal 구간이 95%인줄 안다. | ||
+ | se2 <- se * 2 | ||
+ | # 즉, 아래 구간이 | ||
+ | lower <- p-se2 | ||
+ | upper <- p+se2 | ||
+ | lower | ||
+ | upper | ||
+ | |||
+ | hist(ps.k) | ||
+ | abline(v=lower, | ||
+ | abline(v=upper, | ||
+ | |||
+ | </ | ||
+ | 즉 아래의 그래프에서 | ||
+ | {{: | ||
+ | lower: 0.1633975와 (16.33975%) upper: 0.3366025 사이에서 (33.66025%) red gumaball의 비율이 나올 확률이 95%라는 이야기. | ||
+ | |||
+ | 그렇다면 만약에 30% 이상이 red gumball일 확률은 무엇이라는 질문이라면 | ||
+ | 우리는 X ~ B(100, 1/4)에서 도출되는 | ||
+ | X ~ N(p, se) 에서 P(X> | ||
+ | 1-pnorm(0.295, | ||
+ | 1-pnorm(0.295, | ||
+ | [1] 0.1493488 | ||
===== Exercise ===== | ===== Exercise ===== | ||
- | < | + | < |
25% of the gumball population are red. What’s the probability that in a box of 100 gumballs, at least 40% will be red? We’ll guide you through the steps. | 25% of the gumball population are red. What’s the probability that in a box of 100 gumballs, at least 40% will be red? We’ll guide you through the steps. | ||
Line 434: | Line 489: | ||
====== Sampling distribution of sample mean ====== | ====== Sampling distribution of sample mean ====== | ||
- | < | + | < |
According to Mighty Gumball’s statistics for the population, the mean number of gumballs in each packet is 10, and the variance is 1. The trouble is they’ve had a complaint. One of their most faithful customers bought 30 packets of gumballs, and he found that the average number of gumballs per packet in his sample is only 8.5. | According to Mighty Gumball’s statistics for the population, the mean number of gumballs in each packet is 10, and the variance is 1. The trouble is they’ve had a complaint. One of their most faithful customers bought 30 packets of gumballs, and he found that the average number of gumballs per packet in his sample is only 8.5. | ||
</ | </ | ||
Line 530: | Line 585: | ||
===== Exercise ===== | ===== Exercise ===== | ||
- | < | + | < |
Let’s apply this to Mighty Gumball’s problem. | Let’s apply this to Mighty Gumball’s problem. | ||
Line 567: | Line 622: | ||
</ | </ | ||
+ | ====== Recap ====== | ||
+ | Distribution of **Sample** <fc # | ||
+ | when sampling n entities (repeatedly) from a population whose proportion is p. | ||
+ | \begin{eqnarray*} | ||
+ | Ps & \sim & N(p, \frac{pq}{n}) \\ | ||
+ | \text{hence, | ||
+ | \text{standard deviation of} \\ | ||
+ | \text{sample proportions} & = & \sqrt{\frac{pq}{n}} | ||
+ | \end{eqnarray*} | ||
+ | Distribution of **Sample** <fc # | ||
+ | when sampling a sample whose size is n from a population whose mean is $\mu$ and variance is $\sigma^2$. | ||
+ | \begin{eqnarray*} | ||
+ | \overline{X} & \sim & N(\mu, | ||
+ | \text{hence, | ||
+ | \text{standard deviation of} \\ | ||
+ | \text{sample means} & = & \sqrt{\frac{\sigma^2}{n}} \\ | ||
+ | & = & \frac{\sigma}{\sqrt{n}} | ||
+ | \end{eqnarray*} |
b/head_first_statistics/estimating_populations_and_samples.1730849015.txt.gz · Last modified: by hkimscil