Differences

This shows you the differences between two versions of the page.

--- b:head_first_statistics:constructing_confidence_intervals [2019/12/04 11:58] – [Four steps for finding confidence intervals] hkimscil
+++ b:head_first_statistics:constructing_confidence_intervals [2019/12/09 08:42] – [The problem with precision] hkimscil
@@ Line 10: / Line 10: @@
 Rather than specify an exact value, we can specify two values we expect flavor duration to lie between.
-{{:b:head_first_statistics:pasted:20191203-121916.png}}
+[{{:b:head_first_statistics:pasted:20191203-121916.png  }}] : As an example, you may want to choose a and b so that there’s a 95% chance of the interval containing the population mean. Finding the exact spot of a and b is the problem we are trying to solve.
-The far side of each end, (a, b) is called a confidence interval.
-{{:b:head_first_statistics:pasted:20191203-122050.png}}
+The far side of each end, (a, b) is called a **//confidence interval//**.
+즉, 샘플의 평균을 Point estimate로 사용하고, 그 지점을 중심으로 95%의 확률을 가지는 구간을 구해 population의 평균으로 삼는다. 이 구간을 **//신뢰구간//**이라고 한다.
 ===== Four steps for finding confidence intervals =====
-Step 1: Choose your population statistic
+{{:b:head_first_statistics:pasted:20191203-122050.png}}
+<fs large>**Step 1:**</fs> Choose your population statistic
+If we go back to the work we did in the last chapter, then the sampling distribution of means has the following expectation and variance:
 {{:b:head_first_statistics:pasted:20191203-122301.png}}
-Step 2: Find its sampling distribution
+<fs large>**Step 2**</fs>: Find its __**sampling distribution**__
+샘플평균들의 분산은 ($Var(\overline{X})$) 모집단의 특성인데 (parameter), 이를 알 수는 없으므로 아래와 같이 샘플의 분산값을 ($s^{2}$) 사용하여 샘플평균들의 분포를 만든다.
 {{:b:head_first_statistics:pasted:20191203-122550.png}}
-Mighty Gumball used a sample of 100 gumballs to come up with their
-estimates, and they have calculated that s2 = 25. This means that
+위대한 풍선껌은 (Mighty Gumball) 100개의 풍선검을 샘플로 이용하여 단맛의 지속시간을 측정하고, 이 샘플의
+  * 평균값으로 62.7을
+  * 분산값으로 (s<sup>2</sup>) 25를 얻었다.
+이를 이용하여 샘플평균들의 (n=100일 때) 분포의 (distribution) 분산값을 예측해보면 0.25를 얻는다.
 {{:b:head_first_statistics:pasted:20191203-122843.png}}
+위를 일반화해서 생각해보면 $X \sim N(\mu, \sigma^{2})$이라고 할 때, 샘플의 숫자가 충분히 크다고 할 때 (n=100과 같이), $E(\overline{X})$ 값과 $Var(\overline{X})$ 값은 아래와 같다.
 {{:b:head_first_statistics:pasted:20191203-122946.png}}
-Step 3: Decide on the level of confidence
+<fs large>**Step 3:**</fs> Decide on the level of confidence
-Step 4: Find the confidence limits
+Confidence interval, 즉 a 지점과 b 지점사이의 구간을 0.95로 하기로 한다 (일반관행).
-{{:b:head_first_statistics:pasted:20191203-123220.png}}
+<fs large>**Step 4:**</fs> Find the confidence limits
+위에서 얻은 $\overline{X} \sim N(\mu, 0.25)$를 가정하고 아래의 a, b 구간을 95%라고 하면, 양 쪽 끝은 각각, 0.025 씩이 될 것이다.
+{{:b:head_first_statistics:pasted:20191203-123220.png}}
+즉, 우리는 $P(\overline{X} < a) = 0.025$ 에서의 a와, $P(\overline{X} > b) = 0.025$에서의 b를 구해서 이를 confidence limits의 경계값으로 삼으면 된다. 그런데 위의 그림과 같은 분포에서의 2.5%에 해당하는 부분을 직접 찾을 수는 없으므로 (r과 같은 프로그램이 없다고 가정), 표준점수를 기준으로 생각하여 z-table에서의 2.5%에 해당하는 z 점수를 찾아야 한다.
 {{:b:head_first_statistics:pasted:20191203-123406.png}}
 {{:b:head_first_statistics:pasted:20191203-123432.png}}
 $$P(z_{a} < Z < z_{b}) = 0.95$$