Differences

This shows you the differences between two versions of the page.

--- c:ms:2024:schedule [2024/03/18 08:54] – [Week03] hkimscil
+++ c:ms:2024:schedule [2024/05/08 08:48] (current) – [Week09] hkimscil
@@ Line 368: / Line 368: @@
 pnorm(s3.h) - pnorm(s3.l)
-#
+# for variance of sample means
-# sa, http://commres.net/wiki/sampling_distribution_in_r?#n_4
+# see the [[:sampling distribution in r]]
-#
-#
-m.ca <- 70
-sd.ca <- 10
-set.seed(1001)
-iter <- 10000
-n <- 4
-s4.eg <- rnorm(n, m.ca, sd.ca)
-mean.s4.eg <- mean(s4.eg)
-s4.eg
-mean.s4.eg
-means <- rep (NA, iter) # empty space of # of iteration
-for(i in 1:iter){
-  means[i] = mean(rnorm(n, m.ca, sd.ca))
-}
-# according to the professor
-m.means.should.be <- m.ca  # mean = 70; sd = 10;
-v.means.should.be <- sd.ca^2/n # variance of sample distribution (when n=4)
-sd.means.should.be <- sqrt(v.means.should.be)
-m.means.should.be
-v.means.should.be
-sd.means.should.be
-# Would actual calculated numbers be
-# similar to the above?
-m.means <- mean(means)
-v.means <- var(means)
-sd.means <- sd(means)
-m.means
-v.means
-sd.means
-se <- sd.means.should.be
-se2 <- se*2
-m.ca - se2
-m.ca + se2
-#
-# sa, http://commres.net/wiki/sampling_distribution_in_r?#n_25
-#
-#
-m.ca <- 70
-sd.ca <- 10
-set.seed(1001)
-iter <- 10000
-n <- 25
-s.eg <- rnorm(n, m.ca, sd.ca)
-mean.s.eg <- mean(s.eg)
-s.eg
-mean.s.eg
-means <- rep (NA, iter) # empty space of # of iteration
-for(i in 1:iter){
-  means[i] = mean(rnorm(n, m.ca, sd.ca))
-}
-# according to the professor
-m.means.should.be <- m.ca  # mean = 70; sd = 10;
-v.means.should.be <- sd.ca^2/n # variance of sample distribution (when n=4)
-sd.means.should.be <- sqrt(v.means.should.be)
-m.means.should.be
-v.means.should.be
-sd.means.should.be
-# Would actual calculated numbers be
-# similar to the above?
-m.means <- mean(means)
-v.means <- var(means)
-sd.means <- sd(means)
-m.means
-v.means
-sd.means
-se <- sd.means.should.be
-se2 <- se*2
-m.ca - se2
-m.ca + se2
-#
-# sa, http://commres.net/wiki/sampling_distribution_in_r?#n_100
-#
-#
-m.ca <- 70
-sd.ca <- 10
-set.seed(1001)
-iter <- 10000
-n <- 100
-s.eg <- rnorm(n, m.ca, sd.ca)
-mean.s.eg <- mean(s.eg)
-s.eg
-mean.s.eg
-means <- rep (NA, iter) # empty space of # of iteration
-for(i in 1:iter){
-  means[i] = mean(rnorm(n, m.ca, sd.ca))
-}
-# according to the professor
-m.means.should.be <- m.ca  # mean = 70; sd = 10;
-v.means.should.be <- sd.ca^2/n # variance of sample distribution (when n=4)
-sd.means.should.be <- sqrt(v.means.should.be)
-m.means.should.be
-v.means.should.be
-sd.means.should.be
-# Would actual calculated numbers be
-# similar to the above?
-m.means <- mean(means)
-v.means <- var(means)
-sd.means <- sd(means)
-m.means
-v.means
-sd.means
-se <- sd.means.should.be
-se2 <- se*2
-m.ca - se2
-m.ca + se2
-#
-# sa, http://commres.net/wiki/sampling_distribution_in_r?#n_400
-#
-#
-m.ca <- 70
-sd.ca <- 10
-set.seed(1001)
-iter <- 10000
-n <- 400
-s.eg <- rnorm(n, m.ca, sd.ca)
-mean.s.eg <- mean(s.eg)
-s.eg
-mean.s.eg
-means <- rep (NA, iter) # empty space of # of iteration
-for(i in 1:iter){
-  means[i] = mean(rnorm(n, m.ca, sd.ca))
-}
-# according to the professor
-m.means.should.be <- m.ca  # mean = 70; sd = 10;
-v.means.should.be <- sd.ca^2/n # variance of sample distribution (when n=4)
-sd.means.should.be <- sqrt(v.means.should.be)
-m.means.should.be
-v.means.should.be
-sd.means.should.be
-# Would actual calculated numbers be
-# similar to the above?
-m.means <- mean(means)
-v.means <- var(means)
-sd.means <- sd(means)
-m.means
-v.means
-sd.means
-se <- sd.means.should.be
-se2 <- se*2
-m.ca - se2
-m.ca + se2
-#
-# sa, http://commres.net/wiki/sampling_distribution_in_r?#n_1600
-#
-#
-m.ca <- 70
-sd.ca <- 10
-set.seed(1001)
-iter <- 10000
-n <- 1600
-s.eg <- rnorm(n, m.ca, sd.ca)
-mean.s.eg <- mean(s.eg)
-s.eg
-mean.s.eg
-means <- rep (NA, iter) # empty space of # of iteration
-for(i in 1:iter){
-  means[i] = mean(rnorm(n, m.ca, sd.ca))
-}
-# according to the professor
-m.means.should.be <- m.ca  # mean = 70; sd = 10;
-v.means.should.be <- sd.ca^2/n # variance of sample distribution (when n=4)
-sd.means.should.be <- sqrt(v.means.should.be)
-m.means.should.be
-v.means.should.be
-sd.means.should.be
-# Would actual calculated numbers be
-# similar to the above?
-m.means <- mean(means)
-v.means <- var(means)
-sd.means <- sd(means)
-m.means
-v.means
-sd.means
-se <- sd.means.should.be
-se2 <- se*2
-m.ca - se2
-m.ca + se2
-#
-# sa, http://commres.net/wiki/sampling_distribution_in_r?#n_2500
-#
-#
-m.ca <- 70
-sd.ca <- 10
-set.seed(1001)
-iter <- 10000
-n <- 2500
-s.eg <- rnorm(n, m.ca, sd.ca)
-mean.s.eg <- mean(s.eg)
-s.eg
-mean.s.eg
-means <- rep (NA, iter) # empty space of # of iteration
-for(i in 1:iter){
-  means[i] = mean(rnorm(n, m.ca, sd.ca))
-}
-# according to the professor
-m.means.should.be <- m.ca  # mean = 70; sd = 10;
-v.means.should.be <- sd.ca^2/n # variance of sample distribution (when n=4)
-sd.means.should.be <- sqrt(v.means.should.be)
-m.means.should.be
-v.means.should.be
-sd.means.should.be
-# Would actual calculated numbers be
-# similar to the above?
-m.means <- mean(means)
-v.means <- var(means)
-sd.means <- sd(means)
-m.means
-v.means
-sd.means
-se <- sd.means.should.be
-se2 <- se*2
-m.ca - se2
-m.ca + se2
 </code>
+see the [[:sampling distribution in r]]
-<code>
-# now
-# we know exactly variance of sample distribution = (sd.ca^2)/n
-# hence sd of it = sqrt((sd.ca^2)/n)
-# sample size poole
-samples <- seq(from=1, to=3600, by=1)
-num.samples <- length(samples)
-ses <- rep (NA, num.samples) # empty space of # of iteration
-for(i in 1:num.samples){
-  ses[i] = sqrt((sd.ca^2)/samples[i])
-}
-ses[4]
-ses[25]
-ses[100]
-plot(ses)
-</code>
-{{:c:ms:2024:pasted:20240318-085214.png}}
-now see the [[:sampling distribution in r]] document again!
 ===== Assignment =====
@@ Line 643: / Line 381: @@
 ====== Week04 ======
 <WRAP half column>
+동영상 시청
+  * https://youtu.be/Qaxj6LZ-iL0 : sampling distribution
+  * https://youtu.be/0RZJbZtzs6s : sampling distribution e.g. in R
+  * https://youtu.be/AbeIQvJJ5Vw : mean and variance (standard deviation) in sampling distribution (샘플평균들의 집합에서의 평균과 분산 (표준편차))
+  * https://youtu.be/zFdbt2XoeM4 : CLT (central limit theorem) and standard error 중심극한정리와 표준오차
+  * https://youtu.be/Udp-4MLAlvc : Testing hypothesis based on CLT principle CLT에 근거를 둔 가설의 검증
+  * [[:sampling distribution in r]]
 ===== Class Activity =====
@@ Line 670: / Line 418: @@
 </WRAP>
 <WRAP half column>
+{{:c:ms:2023:pasted:20230329-102748.jpeg}}
+아래 두번째 그림은 population의 평균이 102 일 때
+명을 (1600명이 아니라) 샘플로 취했을 때의
+샘플평균들의 집합을 그린것입니다.
+{{:c:ms:2023:pasted:20230329-102811.jpeg}}
 ===== Assignment =====
-</WRAP>
-====== Week05 ======
-<WRAP half column>
 ===== Announcement Quiz 01 =====
-<WRAP box>
+다음 주 수요일 (5주차 두번째시간) 퀴즈 있습니다.
-다음 주 수요일 (6주차 첫시간) 퀴즈 있습니다.
 퀴즈 범위는
   * 5주차까지 언급된 모든 동영상
@@ Line 699: / Line 447: @@
   * 시험문제는 4지선다 혹은 단답식 답입니다.
   * 문제는 모두 50문제 정도입니다.
-</WRAP>
-동영상 시청
-  * https://youtu.be/Qaxj6LZ-iL0 : sampling distribution
-  * https://youtu.be/0RZJbZtzs6s : sampling distribution e.g. in R
-  * https://youtu.be/AbeIQvJJ5Vw : mean and variance (standard deviation) in sampling distribution (샘플평균들의 집합에서의 평균과 분산 (표준편차))
-  * https://youtu.be/zFdbt2XoeM4 : CLT (central limit theorem) and standard error 중심극한정리와 표준오차
-  * https://youtu.be/Udp-4MLAlvc : Testing hypothesis based on CLT principle CLT에 근거를 둔 가설의 검증
+</WRAP>
-  * [[:sampling distribution in r]]
+====== Week05 ======
+<WRAP half column>
+<WRAP box>
+</WRAP>
 ===== Concepts and ideas =====
 [[:b:r cookbook:Data Structures]]
@@ Line 748: / Line 493: @@
 </WRAP>
 <WRAP half column>
-{{:c:ms:2023:pasted:20230329-102748.jpeg}}
+===== Assignment =====
-아래 두번째 그림은 population의 평균이 102 일 때
-명을 (1600명이 아니라) 샘플로 취했을 때의
+</WRAP>
-샘플평균들의 집합을 그린것입니다.
-{{:c:ms:2023:pasted:20230329-102811.jpeg}}
 <code>
@@ Line 764: / Line 507: @@
 m.tg <- mean(treated.group)
 m.tg
+# H1: m.tg =\ mu.pop (100) ?
+# H0: if m.tg =\ mu.pop (100)
+# then
+# n=16 Xbar ~ N(mu.pop, 25/4)
+# 즉 Xbar집합의 분산은 6.25
+# 표준편차는 (표준오차, se) 2.5
+# 따라서 Xbar 집합의 평균을 중심으로한
+# 95% 범위는 pop.mu +- 2*(se)
+# 즉, 100중 95는 95 ~ 105 사이에서 샘플의 평균이 나와야 함
+# 즉, m.tg는 위의 범위에서 나와야 함. 그러나
+# 나머지 5%는 95 밑이나 105 위에서 나올 수도 있음
+# 그런데, m.tg = 113.0706
+# 이를 근거로 영가설을 부정하고
+# 검증하고자 하는 연구가설을 채택함
+# 즉, treated group 과 모집단의 평균은 다르다. 혹은
+# treated group은 모집단에서 추출될 수 있는 샘플이 아니라
+# 다른 모집단에 속한 샘플이다 (95% 확신, 5% 에러마진)
+se <- sqrt((sd.pop^2)/16)
+qnorm(0.975,mean=100,sd=se)
+# [1] 104.8999
+qnorm(0.025,mean=100,sd=se)
+# [1] 95.10009
+# 그렇다면 mu.tg 값이 나올 확률은 몇일까?
+pnorm(mu.tg, mean=100, sd=se)
+# [1] 0.9999999
+sscore <- (m.tg-mu.pop)/se
+sscore
+# [1] 5.22823
+-pnorm(sscore,0,1)
+# [1] 8.557037e-08
+a <- 1-pnorm(sscore,0,1)
+b <- pnorm(-sscore,0,1)
+a
+# [1] 8.557037e-08
+b
+# [1] 8.557037e-08
+a+b
+# [1] 1.711407e-07
 # install.packages("BSDA")
@@ Line 795: / Line 578: @@
 </code>
-===== Assignment =====
+<code>
+> z.test(treated.group, mu=mu.pop, sigma.x=sd.pop)
+	One-sample z-Test
+data:  treated.group
+z = 5.2282, p-value = 1.711e-07
+alternative hypothesis: true mean is not equal to 100
+percent confidence interval:
+.1707 117.9705
+sample estimates:
+mean of x
+.0706
+>
+# 위에서 . . . . z 값이 +_2 밖이면 영가설을 부정하고
+# 연구가설을 채택하게 된다
+</code>
+<code>
+# 샘플 숫자가 작을 경우 위의 +-2 점수가 정확하지
+# 않기 때문에 보정을 해주게 된다. 이 보정된 값은
+# 샘플의 숫자에 따라서 (degrees of freedom) 달
+# 라지게 된다
+</code>
+[[:t-test]]
+[[:t distribution table]]
+[[:r:t-test]] in R
-</WRAP>
 ====== Week06 ======
@@ Line 824: / Line 634: @@
 <WRAP half column>
 [[./schedule/week06 t-test and anova note]]
+<code>
+# pnorm
+# qnorm
+# pt
+# qt
+percentage <- .975
+df <- 99
+t.critical <- qt(percentage, df) # sample size = df + 1 일 때, 95%에 해당하는 점수는?
+t.critical
+t.calculated <- 3.6
+df <- 8
+pt(t.calculated, df)
+</code>
 ===== Announcement =====
 ===== Assignment =====
 </WRAP>
 ====== Week07 ======
@@ Line 878: / Line 704: @@
 ===== 8주차 퀴즈 =====
 주차 정기시험기간 중에 2차 퀴즈
-  * 4월26일, 09:00 ~ (A, B교시)
+  * 시간
+    * 09:00 ~ (A, B교시)
   * 범위
     * 처음부터 One-way ANOVA test with post hoc test 까지 (R square에 대한 설명포함)
@@ Line 902: / Line 729: @@
 ====== Week08 ======
 시험기간
-보강영상 수업
 ====== Week09 ======
@@ Line 961: / Line 787: @@
 ===== Assignment =====
-그룹 assignment week09
-{{:r:cookies.xlsx}} --> 2-way ANOVA test 계산해보기
-{{:r:repeated_measures_anova_eg.xlsx}} --> Repeated measure ANOVA 계산해보기 ([[:Repeated Measure ANOVA]] 참조).
-과제 첫 번째 문제는 Repeated measure ANOVA 입니다. Factorial ANOVA가 아닙니다.
-<code>
-patient	drug1	drung2	drug3
-	30	28	16
-	14	18	10
-	24	20	18
-	38	34	20
-	26	28	14
-</code>
-edited
-ms.23.ga.w09.anova
-  - 위 데이터를 엑셀과 같은 스프레드시트 프로그램에 입력하고 F-test를 직접 계산하세요.
-    * <del>{{:detergent.anova.by.hand.xlsx}} 엑셀의 데이터를 가지고 F-test를 직접 계산하세요.</del>
-    * 과제는 ms.23.ga.groupID.w09.anova.by.hand.xlsx 파일에 <del>두개의 tab을 만들어</del> 수행하고 제출하세요
-  - {{:r:twoway.anova.by.hand.xlsx}}
-    * 위 파일을 다운로드 받아서 anova test를 직접 계산하세요.
-    * 계산한 결과를 ms.23.ga.groupID.w09.twoway.anova.by.hand.xlsx 파일이름으로 저장하여 업로드하세요.
-  - {{:r:twoway.anova.by.hand.data.csv}} 파일을 R에서 데이터로 (df) 불러와서 F-test를 수행하세요. 수행한 내용을 캡춰하여 groupID.w.09.twoway.anova.by.hand.data.docx 에 저장하여 업로드하세요. 과제 작성은 fixed font로 해야 합니다.
-  - 조원과 의논하여 아래를 수행하세요
-    * Indepdent sample t-test 를 수행할 가설을 전공과 관련하여 만드세요.
-    * Oneway ANOVA 를 수행할 가설을 전공과 관련하여 만드세요.
-    * Repeated measure ANOVA 를 수행할 가설을 전공과 관련하여 만드세요.
-    * Twoway ANOVA 를 수행할 가설을 전공과 관련하여 만드세요.
-    * ms.23.ga.groupID.w09.making.hypothesis.odc 와 같은 파일 이름으로 저장하여 업로드하세요
-----
-===== 소희학생 과제 =====
-  * 아래를 수행하세요. 소희학생은 group 13으로 되어 있어서 group 13으로 올리시기 바랍니다.
-    * [[:t-test]], [[:ANOVA]], [[:repeated measure ANOVA]], [[:factorial ANOVA|twoway ANOVA]] 를 수행할 수 있는 가설을 만드세요.
-    * 각 가설의 독립변인과 종속변인을 ([[:types of variables]]) 기술하세요.
-    * 각 변인의 측정수준을 ([[:level of measurement]]) 기술하세요.
-  * 아래를 수행하세요
-    * R에서 ''?ToothGrowth'' 를 친 후에 이 데이터가 무엇에 관한 것인지 설명하세요.
-    * supp를 독립변인으로 하여 가설을 만들고, R에서 검증한 후 (t-test) 결과를 출력하고 (출력은 fixed font로 해야 합니다), 이를 해석하세요.
-    * dose를 독립변인으로 하여 가설을 만들고, R에서 검증을 한 후에 결과를 출력하고, 이를 해석하세요.
-    * supp와 dose를 동시에 독립변인으로 하여 R에서 검증을 한 후에 결과를 출력하고, 이를 해석하세요.
-  * 과제는  ms.23.ga.w09.anova.by.hand 의 과제제출란에 제출하되,
-  * 파일이름은 ga.g13.w09.hypothesis.testing.docx 로 하여 과제파일을 올리세요.
 </WRAP>