[[./|Class page]] ====== Week01 (March 5, 7) ====== ===== ideas and concepts ===== Introduction to the class 네이버에서 새로운 UI를 디자인 하려고 작업을 하고 있다. 디자인 시안으로 A안과 B안을 만들어 낸 상태이다. 네이버에서 당신에게 어떤 안이 더 좋을지를 조사한 후 보고서를 작성해 달라고 하였다. 본격적인 절차에 들어가기 전에 어떻게 해야 할지를 정리하여 플랜으로 만들어 제출하는 것이 좋겠다고 팀장이 요구하였다. 프랜을 제출하시오. * 연구문제 ([[:research question]]) * 개념 (concept) vs 개념 (concept) * concepts, ideas, etc. ===== Assignment ===== 수업활동입니다. 강사는 위와 같은 일종의 연구문제를 제시하고 이것이 통계와 어떤 관련이 있는지를 설명하였습니다. 이와 같은 성격의 자신의 연구문제를 만들어서 아주BB 토론방에 올려주시기 바랍니다 (토론방 링크 추후 제시). ====== Week02: (March 10, 15) ====== Reading: Textbook Chapter 1, 2 * 1.1 맥락의 중요성 숙지할 것 * 차이와 관계성 (difference vs. association) 등등 * 아래 ideas and concepts section 참조. ===== ideas and concepts ===== 학생 A는 성별에 따라 영화 취향이 다르다고 생각한다. 예를 들어, 남자는 대체적으로 액션 영화를 좋아하고 여자는 로맨스 영화를 좋아한다고 생각한다. 학생 A의 주장을 증명할 수 있는 방법을 찾아보자. * HOW? 요즘 남성 여성 구분 없이 자기관리를 위해 화장품 및 장신구 구매가 기하급수적으로 늘어나고 있다. 이에 들어가는 비용도 정말 만만치 않다. 그런데 A씨는 자신이 꾸미지 않아도 본인이 충분히 괜찮다고 하고 B씨는 자기관리를 하지 않으면 본인의 삶에 만족하지 못한다고 한다. 자기관리 비용에 따른 자기만족도 차이를 증명하기 위한 방법은? * {{http://fetzer.org/sites/default/files/images/stories/pdf/selfmeasures/Self_Measures_for_Self-Esteem_ROSENBERG_SELF-ESTEEM.pdf|ROSENBERG SELF-ESTEEM SCALE}} [[:research_methods_lecture_note#커뮤니케이션_연구문제_제기와_가설|연구문제와 가설]] 참조 위의 문서를 꼭 읽어야 합니다. 연구설계 * 연역 대. 귀납 (induction vs. deduction) * [[:Research Question|연구문제]] (research question) * [[:Hypothesis|가설]] (hypothesis) * **차이와 연관** [[:Hypothesis|가설]] 참조 * Difference * Association * [[:variable|변인 (variable)]] 설명 * 속성 (attributes): [[:level of measurement|측정수준]]: 척도의 4가지 유형 참조 (p.78). * **종류와 숫자** (교재 참조: 불연속 vs. 연속) * NOIR (교재 참조: 명명, 서열, 간격, 비율척도) * [[:Types of Variable]] 변인의 종류 * Dependent * Independent * Control * Moderating (Intervening) * eg., * IV, DV: 부모의 교육수준 --> 자녀의 수능점수 * moderating: 부모의 수입수준의 개입 * control: 부모의 교육수준 (대학원 이상으로 콘트롤, 15년 이상) * intervening: 학생의 성별 * 그렇다면 어떤 가설과 연구문제가 적절한가? * Sampling * [[:Sampling]] * [[:Sampling#sampling frame]] * ECOBS * Probability sampling * [[Systematic Sampling]]: * [[Stratified Sampling]]: * [[Multistage Cluster Sampling]]: * [[Stratified in Multistage Cluster Sampling]]: * Non-probability sampling * [[:Sample frame]] * [[:Sample]] vs [[:Population]] ===== Assignment 2 ===== === 1 === 자신의 전공관 관련된 관심사와 관려된 가설을 3개 작성하시오. 그 중 2개는 차이의 가설이어야 하고 나머지 1개는 상관의 가설이어야 합니다. 가설은 그 가설이 도출된 이유가 설명되어야 합니다. 제출일은 17일까지 입니다. ====== Week03 (March 17, 22) ====== ===== ideas and concepts ===== 사회과학대 - SPSS 사용가능? * http://www.uvm.edu/~dhowell/fundamentals7 Textbook supplementary page * http://www.uvm.edu/~dhowell/fundamentals7/SeeingStatisticsApplets/Applets.html Seeingstatistics Applet page [[.schedule/week03|For the lecture content]] ===== Assignment ===== ====== Week04 (March 24, 29) ====== ===== ideas and concepts ===== Ch. 5, 6, 7, 8 * [[:range]] 혹은 [[:range|범위]] * [[:interquartile range]] 혹은 [[:interquartile range|사분위 범위]] * 평균편차 * 변량 [[:Variance]] 혹은 [[:Variance|분산]] * 모집단분산 * 표본분산 [[:why n-1]] [[:degrees of freedom]] 혹은 [[:degrees of freedom|df]] * text 부분: 추정치로서의 평균과 변량(분산) * 표준편차 [[:Standard Deviation]] * 계산공식 [[:c/ms/2017/schedule/week03#variance_calculation_formula|분산계산공식]]

optical_illusion <- c(1.73, 1.06, 2.03, 
1.40, .95, 1.13, 1.41, 1.73, 1.63, 1.56)

> mean(optical_illusion)
[1] 1.463

> var(optical_illusion)
[1] 0.1160678

> sqrt(var(optical_illusion))
[1] 0.3406872

> stem(optical_illusion)

  The decimal point is at the |

  0 | 9
  1 | 1144
  1 | 6677
  2 | 0

> median(optical_illusion)
[1] 1.485

> mode(optical_illusion)
[1] "numeric"

[[http://www.uvm.edu/~dhowell/fundamentals7/SeeingStatisticsApplets/Applets.html|Textbook applet page]] -- working only in IE. . . . * 정상분포 (정규분포) [[:Normal Distribution]] * 표준점수 [[:z score]] * {{youtube>txA8X2j7w4E}} p. 179 정상 성인이 10초동안 두드리는 속도의 분포가 평균 59, 표준편차 7인 정상분포를 취한다고 한다. 이 때, 한 환자가 10초 동안 45번을 두드린다고 하는데, 이 환자는 정상인걸까 아니면 정상 성인이 아닌 것일까?

finger_tap<-rnorm(n=10000, m=59, sd=7) 
hist(finger_tap)

see [[https://www.mathsisfun.com/data/standard-normal-distribution-table.html|normal distribution table]] .05 (5%)에 들때의 z score? = -1.645 (혹은 .025(2.5%)일 경우 = -1.96 =about -2) 그렇다면, 이 환자의 z score 는? z-score = (45-59) / 7 = -2 When z=-2 일때 분포확률 = 2.28% = 0.0228 How to look at "Normal" person? within .05? or .01? = 기각수준, 유의도수준 Please note that this is a hypothetical test for one individual (not a sample) against population. Ch. 12, 13 --- * 가설검증 ([[:Hypothesis testing]]) * 영가설(null hypothesis) * 연구가설, 대립가설 (research hypothesis, alternative hypothesis) * [[:types of error]] * type I error * type II error * [[:z-test]] * [[:t-test]] YSR (Youth Self-Report Inventory) 의 우울/불안 척도 ( $ \mu = 50, \sigma = 10 $ ) 다섯 아이의 평균 = 56 일때, 이 아이들의 정상정도에 대한 가설 검증 * 절차를 옆사람에게 설명하시오.

> 10/sqrt(5)
[1] 4.47

===== Assignment ===== ====== Week05 (March 31, April 5) ====== ===== ideas and concepts ===== 우선 type I and type II error 다시 확인 [[:types of error]] [[:z-test]] [[:t-test]] ''Q.'' Alcohol이 임산부에게 미치는 영향 : Alcohol이 임산부에게 미치는 영향에 대해서 조사를 하는 연구자가, 임신 중의 alcohol 섭취가 태아의 몸무게에 미치는 영향에 대해서 관심을 가졌다. 이에 따라서 n = 16 의 랜덤 샘플 쥐가 구해졌다. 어미 쥐는 매일 일정량의 alcohol을 섭취하였다. 연구자는 이 쥐들의 새끼 중 하나씩을 선택해서 n = 16의 샘플을 취한 후 평균을 내 보았더니, $\overline{X}$ = 15 grams 이었다. 보통 쥐의 경우 평균 몸무게는 $\mu = 18$ 그램이고 $\sigma = 4$ 라는 것을 연구자는 알고 있다. 연구자는 alcohol의 영향력을 어떻게 테스트해야 할까? [[https://www.easycalculation.com/statistics/t-distribution-critical-value-table.php|T dist. table]]

> rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
> potato_sample <- rnorm2(25, 191,20)
> rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
> rat <- rnorm2(16, 15, 4)
> t.test(rat, mu=18, sd=4)

	One Sample t-test

data:  rat
t = -3, df = 15, p-value = 0.008973
alternative hypothesis: true mean is not equal to 18
95 percent confidence interval:
 12.86855 17.13145
sample estimates:
mean of x 
       15 

>

28명의 SAT score. reasonable guess의 효과 각 문항은 다섯개의 선택지가 존재한다고 할 때 학생들이 reasonable guess를 이용하여 답을 풀었을 때 과연 효과가 있다고 할 수 있을까? 58, 48, 48, 41, 34, 43, 38, 53, 41, 60, 55, 44, 43, 49, 47, 33, 47, 40, 46, 53, 40, 45, 39, 47, 50, 53, 46, 53 . . .

> sec12.9 <- c(58, 48, 48, 41, 34, 
43, 38, 53, 41, 60, 55, 44, 43, 49, 47, 
33, 47, 40, 46, 53, 40, 45, 39, 47, 50, 
53, 46, 53)

> mean(sec12.9)
[1] 46.21429

> sqrt(var(sec12.9))
[1] 6.729466

> length(sec12.9)
[1] 28

> t.test(sec12.9, mu=20)

	One Sample t-test

data:  sec12.9
t = 20.6128, df = 27, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
 43.60487 48.82370
sample estimates:
mean of x 
 46.21429 


> num <- mean(sec12.9)-20
> # num = difference
> denum <- sqrt(var(sec12.9))/sqrt(length(sec12.9))
> # denum <- std error 
> tvalue <- num/denum
> tvalue
[1] 20.61277

t test summary * 차이(difference)와 연관(association)의 가설 중 차이의 가설에서 * 독립변인(independent variable)의 attributes가 2개의 종류일 때 t-test를 한다. * remind: see [[:hypothesis]], [[:types of variable]], [[:level of measurement]] * 차이를 알아보는 상황을 정리해 보면 (두 개의 그룹 간) see [[:t-test]] * Population vs. sample의 차이 * population with known $\mu$ and $\sigma$ * population with known $\mu$, but unknown $\sigma$ * two samples 간의 차이 * 두 그룹 간의 비교 * 남/녀 간의 게임적응 능력 차이 * one sample 의 시간을 둔 차이 * 약을 먹고 나타나는 효과 ===== Assignment ===== ====== Week06 (April 7, 12) ====== ===== ideas and concepts ===== 중간고사 기간 중 퀴즈 이후 2 주 후 퀴즈 범위는 양분, 증가 Confidence Interval in [[:t-test]] and [[:confidence interval]] [[:ANOVA]] [[:Repeated Measure ANOVA]] [[:Factorial ANOVA]] ===== Assignment ===== ====== Week07 (April 14, 19) ====== ===== 시험범위 ===== Mid term 범위: * Ch. 1, 2, 3, 4, 5, 6, 7, 8 * + z-test Mid term 이후 시험: * Ch. 5, 6, 7, 8 + * Ch. 12, 13, 14, 15 (효과크기 15.3 이후 제외), 16 ===== ideas and concepts ===== [[:c/ms/2017/schedule/week07|Lecture content]] ===== Assignment ===== ====== Week08 (April 21, 26) ====== __**Mid-term period**__ ====== Week09 (April 28, May 3) ====== ===== ideas and concepts ===== [[:Factorial ANOVA]] ~~[[:Repeated Measure ANOVA]]~~ -- in a future week ===== Assignment ===== ====== Week10 (May 5, 10) ====== ===== ideas and concepts ===== Children's day Budah Birthday ===== Assignment ===== ====== Week11 (May 12, 17) ====== ===== ideas and concepts ===== [[:Correlation]] [[:Regression]] ===== Assignment ===== ====== Week12 (May 19, 24) ====== ===== ideas and concepts ===== [[:correlation]] [[:Regression]] * Variance = SS_total / df * SS_tot = sum of error squared predicted by mean alone * SS_residual * Regression line * a and b in $ \hat{Y} = a + b X $ * $b = {SP} / {SS_{X}}$ * $a = \overline{Y} - b {\overline{X}}$ * error squared predicted by regression line * SS_regression = error squared overcome by regression line * SS_tot = SS_regression + SS_residual * If SS_regression is **big enough**, we can say * X's contribution to explain y's variation is significant * How to determine that? -> F test * $\text{F test} = MS_{\text{regression}} / MS_{\text{total}} $ * with $\text{df}_{\text{regression}} = k - 1$ ; and * $\text{df}_{\text{total}} = n - 1$ * $\text{R}^{\text{2}} = \text{SS}_{\text{reg}} / \text{SS}_{\text{tot}}$ * will be clear with multiple regression * degrees of freedom을 고려한 R² = adjusted R² * addition of IVs will always increase R². * should be penalized (or adjusted) * so, when R² = 1 - (SS_res/SS_tot), use * SS_res -> SS_res/df_res * df_res = n - p - 1 * p = number of IVs * if p increases, the calculated value will be decreased, which will give you adjusted R² value. * SS_tot -> SS_total/df_tot * df_tot = n - 1 * meaning of t test for slope b * Suppose that in $ \hat{Y} = a + b_{1} X_{1} + b_{2} X_{2} $, Xs are not correlated to each other, and X is not contributing anything to Y's variance, * we can say that b = 0. * This is a null hypothesis for testing b * Actual test for determining the contribution of bs is t-test * t = b1 - b / SE_b * $\displaystyle \text{SE}_{\text{b}} = \frac{s_{\text{est}}}{\sqrt{SSX}}$ [[:Multiple Regression]] [[:Sequential Regression]] [[:Using Dummy Variables]] ===== Assignment ===== ====== Week13 (May 26, 31) ====== ===== ideas and concepts ===== ===== Assignment ===== ====== Week14 (June 2, 7) ====== In continuation with [[:ANOVA]], [[:Factorial ANOVA]] [[:Repeated Measures ANOVA]] [[:post hoc test]] [[:Effect size for ANOVA]] Quiz: * [[:t-test]] * F-test * [[:ANOVA]] * [[:Factorial ANOVA]] * [[:Repeated Measures ANOVA]] * [[:Effect size for ANOVA]] 기본적으로 위를 포함하지만, 위를 이해하기 위해서는 * [[:standard deviation]] * [[:variance]] * [[:central limit theorem]] * [[:sampling distribution]] * [[:standard error]] * [[:hypothesis testing]] * [[:z-test]] * [[:types of error]] * [[:variable]] * [[:types of variable]] 등등을 이해해야 합니다. 또한 위를 포함하는 교재의 범위는 * Ch 12: 신뢰한계에 대해서는 수업중에 다루지 않았으므로 제외합니다. 단, 마지막 퀴즈에서는 다루겠습니다. * Ch 13, Ch 14 * ~~Ch 15:~~ * Ch 16: * 단일하지 않은 표본크기 포함 * 다중비교(post hoc 혹은 multiple comparison techniques) 포함 (단 퀴즈에서 수학적인 것은 다루지 않습니다). * 효과크기 중 에타제곱에 해당하는 부분만 포함 * 결과보고하기 포함 (다루지 않았으나 숙지하시기 바랍니다) * Ch 17 (factorial) * 효과크기 중 에타제곱이 아닌 부분은 (r-가족, 오메가 제곱 등) 제외 * 17.7 제외 * 17.8, 17.9 포함 * Ch 18 (repeated measures anova) ====== Week15 (June 9, 14) ====== [[.schedule:week15]] Quiz: 지난 번 범위 + regression 부분 일체 * [[:t-test]] * F-test * [[:ANOVA]] * [[:Factorial ANOVA]] * [[:Repeated Measures ANOVA]] * [[:Effect size for ANOVA]] * regression * [[:regression]] * [[:multiple regression]] * [[:multiple_regression#무엇부터_라는_문제]]와 [[:multiple_regression#determining_ivs_role]] 부분 포함. * [[:using dummy variables]]: 기본적인 논리를 중심으로 이해하세요. 기본적으로 위를 포함하지만, 위를 이해하기 위해서는 * [[:standard deviation]] * [[:variance]] * [[:central limit theorem]] * [[:sampling distribution]] * [[:standard error]] * [[:hypothesis testing]] * [[:z-test]] * [[:types of error]] * [[:variable]] * [[:types of variable]] 등등을 이해해야 합니다. 또한 위를 포함하는 교재의 범위는 * Ch 12: 신뢰한계에 대해서는 수업중에 다루지 않았으므로 마지막 퀴즈에서도 제외합니다. * Ch 13, Ch 14 * ~~Ch 15:~~ * Ch 16: * 단일하지 않은 표본크기 포함 * 다중비교(post hoc 혹은 multiple comparison techniques) 포함 (단 퀴즈에서 수학적인 것은 다루지 않습니다). * 효과크기 중 에타제곱에 해당하는 부분만 포함 (에타제곱, 파샬에타제곱, ~~오메가~~) * 결과보고하기 포함 (다루지 않았으나 숙지하시기 바랍니다) * Ch 17 (factorial) * 효과크기 중 에타제곱이 아닌 부분은 (r-가족, 오메가 제곱 등) 제외 * 17.7 제외 * 17.8, 17.9 포함 * Ch 18 (repeated measures anova) * ====== Week16 (June 16, 21) ====== __**Final-term**__