Differences

This shows you the differences between two versions of the page.

--- types_of_error [2017/03/31 12:02] – [E.G.] hkimscil
+++ types_of_error [2023/11/21 20:49] (current) – [Types of error] hkimscil
@@ Line 2: / Line 2: @@
 {{keywords>types of error, 오류의 종류, 1종 오류, 2종 오류, type I error, type II error}}
 ====== Types of error ======
-<imgcaption fig1|Types of error>{{ :types_of_error.png?500}}</imgcaption>
+<imgcaption fig1|Types of error>{{ :pasted:20200501-173355.png?600}}</imgcaption>
 요약
-  * black line(bl): $\overline{x}=0, \texta{sd}=1$ 정규분포곡선 = 영가설 Null Hypothesis
+  * black line(bl): $\overline{x}=0, \text{sd}=1$ 정규분포곡선 = 영가설 Null Hypothesis
-  * red line(rl):  $\overline{x}=3, \texta{sd}=1$ 정규분포곡선 = 연구가설 Alternative Hypothesis
+  * red line(rl):  $\overline{x}=3, \text{sd}=1$ 정규분포곡선 = 연구가설 Alternative Hypothesis
   * green line: 가설테스트를 했을 때 영가설을 부정하게 되는 기준 (sd=2).
   * 노란색 부분: type I error
@@ Line 11: / Line 11: @@
 설명
-  * H1: $\display\mu_{\text{black}} \neq \mu_{\text{red}} \;\;\; (0 \neq 3) $
+  * H1: $\mu_{\text{black}} \neq \mu_{\text{red}} \;\;\; (0 \neq 3) $
-  * H0: $\display\mu_{\text{black}} = \mu_{\text{red}} \;\;\; (0 = 3) $
+  * H0: $\mu_{\text{black}} = \mu_{\text{red}} \;\;\; (0 = 3) $
-  * H1: 새로운 약의 효과가 3시간 지속되어 기존의 약과 다를 것이다.
+  * H1: 새로운 약의 효과는 기존 약과 다를 것이다.
   * H0: 새로운 약의 효과가 없을 것이다.
-실제 현상이 (약의 효과가) 있는 것으로 가정하면 붉은 선이 현실이 된다. 그러나 연구자는 붉은 선은 가정을 할 뿐 알 수 없으며, 검은 선을 가지고 (즉 영가설을 가지고) 판단을 하게 된다. 이 때 판단의 기준은 녹색 선이며, 이는 SE 단위 둘을 사용한 .05를 가르킨다.
+실제 현상이 (약의 효과가) 있는 것으로 가정하면 붉은 선이 현실이 된다. 그러나 연구자는 붉은 선은 가정을 할 뿐, 실제로는 알 수 없으며, 검은 선을 가지고 (즉 영가설을 가지고) 판단을 하게 된다. 이 때 판단의 기준은 녹색 선이며, 이는 SE 단위 둘을 사용한 .05를 가르킨다.
 <WRAP classes #type_i_error width :language>**__Type I Error__**</WRAP>
@@ Line 33: / Line 33: @@
 알파의 경우는 연구자가 정하는 방법으로 컨트롤할 수 있다. 그러나, 베타의 경우는 알파와 같은 방법을 사용할 수는 없다. 베타를 줄이는, 즉 영가설이 거짓으로 부정을 해야하는데 그렇게 하지 못하는 경우를 줄이는 방법으로 상식, 보편적인 것은 샘플의 n을 키우는 것이다. 좀 더 설명하자면, 위의 그래프 <imgref fig1>에서 각각은 샘플링분포곡선을 의미하므로 각 라인의 표준편차는 표준오차를 의미한다. 표준오차가를 줄이게 되면 두 라인이 서로 겹쳐질 경우가 줄어들게 되고, 이는 곧 베타의 감소를 의미한다.
+<imgcaption fig1|standard error = 1 일 경우>{{:pasted:20200501-173355.png?300}}</imgcaption>
+<imgcaption fig2|standard error = 0.5 일 경우. 회색부분이 생길 가능성이 거의 없다는 것에 주목하라.>{{:pasted:20200501-184558.png?300}}</imgcaption>
 ====== E.G. ======
@@ Line 38: / Line 42: @@
 이는 아래를 보면 더 확연해진다.
-<code>rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
+<code>
+rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
 potato_sample <- rnorm2(25, 194,20)
 mean(potato_sample)
@@ Line 55: / Line 60: @@
 mean of x
 >
-abs(qt(0.05/2, 24))
+</code>
+아래의 qt 펑션 이해를 위해서는 [[https://www.quora.com/In-R-what-is-the-difference-between-dt-pt-and-qt-in-reference-to-the-student-t-distribution|t-distribution function 문서]] 참조
+<code>
+> abs(qt(0.05/2, 24))
 [1] 2.063899
+</code>
+즉, +-2.063899 이상이 되어야지 영가설을 부정할 수 있는데, 현재의 t-score는 -1.5이므로 영가설을 부정할 수 없는 상태이다.
+se 값을 구하는 공식으로 sqrt(25)=5 이니 se = 20/5 = 4 이다. 만약에 n값이 (샘플사이즈) 2500 이라면 se값은 0.4일 것이다 (아래 참조)
+<code>
+> 20/sqrt(length(potato_sample))
+[1] 4
 </code>
@@ Line 63: / Line 78: @@
 potato_sample_large <- rnorm2(2500, 194,20)
 mean(potato_sample_large)
-[1] 191
+[1] 194
 > sqrt(var(potato_sample_large))
 [1]   20
@@ Line 78: / Line 93: @@
 mean of x
+</code>
-> </code>
+<code>
+> abs(qt(0.05/2, 2499))
+[1] 1.960914
+</code>
+위의 경우 critical t value는 +-1.960914 (approx. 2)면 영가설을 부정할 수 있는데, calculated t value는 -15이므로 부정할 수 있다.
+<code>
+> # standard error value
+> 20/sqrt(length(potato_sample_large))
+[1] 0.4
+>
+</code>
 <WRAP help> 위 둘의 se를 비교해 보라. 그리고, 이를 type I and type II error와 관련지어 설명하라 </WRAP>
+mu = 200, sigma = 20 인 상황에서 a___ 뒤의 숫자가 샘플의 크기라고 하면,
-<code>> a <- rnorm(25000000, 200, 4)
+<code>> a25 <- rnorm(50000, 200, 4) # 4 = 20/sqrt(25) = std error값
-> b <- rnorm(25000000, 200, .4)
+> a100 <- rnorm(50000, 200, 2)
-> pa <- hist(a)
+> a400 <- rnorm(50000, 200, 1)
-> pb <- hist(b)
+> a900 <- rnorm(50000, 200, .667)
-> plot(pa, col=rgb(0,0,1,1/4), xlim=c(185,215), ylim=c(0,400))
+> a1600 <- rnorm(50000, 200, .5)
-> plot(pb, col=rgb(1,0,1,1/4), xlim=c(185,215), ylim=c(0,400), add=T)
+> a2500 <- rnorm(50000, 200, .4)
+> a3600 <- rnorm(50000, 200, .333)
+> a4900 <- rnrom(50000, 200, .286)
+> a6400 <- rnorm(50000, 200, .25)
+> a8100 <- rnorm(50000, 200, .222)
+> pa25 <- hist(a25)
+> pa100 <- hist(a100)
+> pa400 <- hist(a400)
+> pa900 <- hist(a900)
+> pa1600 <- hist(a1600)
+> pa2500 <- hist(a2500)
+> pa3600 <- hist(a3600)
+> pa4900 <- hist(a4900)
+> pa6400 <- hist(a6400)
+> pa8100 <- hist(a8100)
+> plot(pa25, col=rgb(.1,.1,.1,.1), xlim=c(185,215), ylim=c(0,15000))
+> plot(pa100, col=rgb(.2,.2,.2,.2), xlim=c(185,215), ylim=c(0,15000), add=T)
+> plot(pa400, col=rgb(.3,.3,.3,.3), xlim=c(185,215), ylim=c(0,15000), add=T)
+> plot(pa900, col=rgb(.4,.4,.4,.4), xlim=c(185,215), ylim=c(0,15000), add=T)
+> plot(pa1600, col=rgb(.5,.5,.5,.5), xlim=c(185,215), ylim=c(0,15000), add=T)
+> plot(pa2500, col=rgb(.6,.6,.6,.6), xlim=c(185,215), ylim=c(0,15000), add=T)
+> plot(pa3600, col=rgb(.7,.7,.7,.7), xlim=c(185,215), ylim=c(0,15000), add=T)
+> plot(pa4900, col=rgb(.8,.8,.8,.8), xlim=c(185,215), ylim=c(0,15000), add=T)
+> plot(pa6400, col=rgb(.9,.9,.9,.9), xlim=c(185,215), ylim=c(0,15000), add=T)
+> plot(pa8100, col=rgb(1,1,1,1), xlim=c(185,215), ylim=c(0,15000), add=T)
 </code>
+{{:sampling_distribution_25_to_8100.png}}
+{{:sampling_distribution_25_to_8100_big.png}}
+{{:sampling_distribution_25_to_8100.pdf}}
 {{:sampling_distribution_25_2500.png}}
 Where is my 194 (sample's mean)?
 {{tag>"1종오류" "2종오류" "오류의 종류" "types of error" "type 1 error" "type 2 error"}}