See also, [[ANOVA]], [[:Factorial Anova|Factorial ANOVA]], [[:t-test#동일집단_간의_차이에_대해서_알아볼_때|paired sample t-test]] [[:r:repeated_measures_anova]] ====== Repeated Measure ANOVA ====== Introduction * one-way ANOVA for //**related, not-independent groups**// * extension of the dependent t-test (one group t-test, repeated measure t-test) * also, it is called "within-subjects ANOVA" or "ANOVA for correlated samples" * the simplest one is __one-way repeated measures ANOVA__ * which requires one independent and one dependent variable * the independent variable is categorical (either nominal or ordinal) * the dependent variable is continuous (interval or ratio) Test Circumstances * one subject with repeated measures across a time period (differences of mean scores across three or more time periods) * participants being tested with headache drugs such as * group A, B, C, placebo * across the time periods j, k, l, m * testing the effect of a three-month exercise training program on blood sugar level * measure blood sugar level at 3 different points (pre-exercise, midway, post-exercise) * one subject with repeated measures in different situation (treatments; differences of mean scores under three or more different conditions) * e.g., participant (n=30) using and evaluating three web site UI (naver, daum, and google) * and rate its usefulness, usability and ease of use * data should look as follows: ^ ^ pre-excerise \\ "sugar level" ^ mid-term \\ "sugar level" ^ post-exercise \\ "sugar level" ^ | a | 250 | 220 | 150 | | b | 300 | 170 | 120 | | c | 150 | 120 | 120 | | d | 230 | 170 | 160 | | e | 260 | 250 | 250 | | | level 1 | level 2 | level 3 | Levels = related groups of the independent variable "time" ^ ^ treatment \\ condition \\ "naver" ^ treatment \\ condition \\ "daum" ^ treatment \\ condition \\ "google" ^ | a | 70 | 60 | 80 | | b | 50 | 70 | 50 | | c | 40 | 50 | 60 | | d | 30 | 40 | 60 | | e | 60 | 50 | 40 | | | level 1 | level 2 | level 3 | in general, the data should look ^ ^ time/condition ^^^ | | T1 | T2 | T3 | | s1 | s1 | s1 | s1 | | s2 | s2 | s2 | s2 | | s3 | s3 | s3 | s3 | | s4 | s4 | s4 | s4 | | s5 | s5 | s5 | s5 | | .. | .. | .. | .. | | sn | sn | sn | sn | You should discern the above from normal ANOVA situation. ^ ^ group ^ treatment ^ | a | 1 | 70 | | b | 1 | 50 | | c | 1 | 40 | | d | 1 | 30 | | e | 1 | 60 | | f | 2 | 60 | | g | 2 | 70 | | h | 2 | 50 | | i | 2 | 40 | | j | 2 | 50 | | k | 3 | 80 | | l | 3 | 50 | | m | 3 | 60 | | n | 3 | 60 | | o | 3 | 40 | LOGICS * $\text{independent ANOVA: } F = \displaystyle \frac{MS_{between}}{MS_{within}} = \frac{MS_{between}}{MS_{error}}$ * $\text{rep measures ANOVA: } F = \displaystyle \frac{MS_{between}}{MS_{within}} = \displaystyle \frac{MS_{conditions}}{MS_{error}}$ 주> * "between" 이란 단어는 독립적인 그룹 **간**의 비교를 의미하므로, 반복측정(repeated measure)의 경우에는 conditions라는 용어를 사용. -- Picture about here -- * but, $\text{SS}_\text{{within}}$ can be partitioned as * $\text{SS}_{\text{ subjects}}$ and $\text{SS}_{\text{ error}}$ * that is, some of the "within variation" are carried along in each individual. * Among the two, we can exclude the first from SSwithin * and solely use the latter as SSerror * This is to say: * in $\text{independent ANOVA: } \text{SS}_\text{{within}} = \text{SS}_{\text{error}} $ * in $\text{rep measures ANOVA: } \text{SS}_\text{{within}} = \text{SS}_{\text{subjects}} + \text{SS}_{\text{error}}$ * This means that the term SSerror will be **__smaller__** * But, with this SSerror, the df is going to be (n-1)(k-1) ^ subjects ^ Pre ^ 1 Month ^ 3 Month ^ Subject \\ Means ^ | 1 | 45 | 50 | 55 | **50** | | 2 | 42 | 42 | 45 | **43** | | 3 | 36 | 41 | 43 | **40** | | 4 | 39 | 35 | 40 | **38** | | 5 | 51 | 55 | 59 | **55** | | 6 | 44 | 49 | 56 | **49.7** | | **Monthly mean** | **42.8** | **45.3** | **49.97** | | | **Grand mean: 45.9** ||||| We do this (and the below example) with an excel {{:r:repeated_measures_anova_eg.xlsx|spreadsheet}}. We also require {{:ftable.pdf|fdistribution table}} to determine the null hypothesis test. ^ Headache Analysis ^^^^^^^ | | base treatment ||||| average \\ per case \\ (subject, \\ participant) | | ser | w1 | w2 | w3 | w4 | w5 | $\overline{X}_{part}$ \\ = average \\ per case \\ (subject, \\ participant) | | 1 | 21 | 22 | 8 | 6 | 6 | 12.6 | | 2 | 20 | 19 | 10 | 4 | 9 | 12.4 | | 3 | 7 | 5 | 5 | 4 | 5 | 5.2 | | 4 | 25 | 30 | 13 | 12 | 4 | 16.8 | | 5 | 30 | 33 | 10 | 8 | 6 | 17.4 | | 6 | 19 | 27 | 8 | 7 | 4 | 13 | | 7 | 26 | 16 | 5 | 2 | 5 | 10.8 | | 8 | 13 | 4 | 8 | 1 | 5 | 6.2 | | 9 | 26 | 24 | 14 | 8 | 17 | 17.8 | | average \\ per week | 20.78 | 20.00 | 9.00 | 5.78 | 6.78 | $\overline{X}$ = 12.47 | ^ Stats ^^ | Mean Total | 12.47 | | $\Sigma{X_i}$ | 561 | | $\Sigma{{X_i}^2}$ | 10483 | | # of week | 5 | | # of case (n) | 9 | SStotal = $\Sigma{(X-\overline{X})^2} $ = 3489.2 \\ SSbetween = SSconditions = SSweeks = $n\Sigma{(\overline{X}_{week} - \overline{X})^2}$ = 1934.5 \\ SSwithin = $ \Sigma \Sigma{(X_{s_i.t_j} - \overline{X_{t_j}})^2}$ = $ \Sigma (411.6, 836.0, 78.0, 93.6, 135.6) $ = 1554.7 \\ SSparticipants = $w\Sigma{(\overline{X}_{participants}-\overline{X})^2}$ = 833.6 \\ SSresidual = SSerror = SSwithin - SSparticipants = 1554.7 - 833.6 = 721.1 OR SSresidual = = SSerror = (SStotal - SSweeks(between)) - SSparticipants = 721.1 \\ \\ dftotal = N - 1 = 45 - 1 = 44 \\ dfweek = 5 - 1 = 4 = dfbetween \\ dfparticipants = 9 - 1 = 8 = dfsubjects \\ dferror= (n - 1)(k - 1) = 8 * 4 = 32 = 40 - 8 = 32 \\ dfwithin = N - k = 45 - 5 = 40 ====== ie ====== ^ 시각적 인지점수 ^^^^ |참가자 | No visual distraction | Visual distraction | Sound Distraction | | A | 47 | 22 | 41 | | B | 57 | 31 | 52 | | C | 38 | 18 | 40 | | D | 45 | 32 | 43 | ====== in r ====== ===== demo1 ===== [[https://rcompanion.org/handbook/I_09.html]] data files in e.gs: {{:demo1.csv}} {{:demo2.csv}} {{:demo3.csv}} {{:demo4.csv}} {{:exer.csv}} demo1 <- read.csv("https://stats.idre.ucla.edu/stat/data/demo1.csv") demo1 str(demo1) ## 모든 변인이 int이므로 (숫자) factor로 바꿔야 한다 ## Convert variables to factor demo1 <- within(demo1, { group <- factor(group) time <- factor(time) id <- factor(id) }) ## 이제 pulse만 제외하고 모두 factor로 변환된 데이터 str(demo1) demo1 data는 아래와 같다. id group pulse time 1 1 10 1 1 1 10 2 1 1 10 3 2 1 10 1 2 1 10 2 2 1 10 3 3 1 10 1 3 1 10 2 3 1 10 3 4 1 10 1 4 1 10 2 4 1 10 3 5 2 15 1 5 2 15 2 5 2 15 3 6 2 15 1 6 2 15 2 6 2 15 3 7 2 16 1 7 2 15 2 7 2 15 3 8 2 15 1 8 2 15 2 8 2 15 3 이를 정리해보면 || || time |||||||| || || t1 || t2 || t3 || mean \\ of the \\ same person's \\ measures || || 1 || 10 || 10 || 10 || 10 || || 2 || 10 || 10 || 10 || 10 || || 3 || 10 || 10 || 10 || 10 || || 4 || 10 || 10 || 10 || 10 || || 5 || 15 || 15 || 15 || 15 || || 6 || 15 || 15 || 15 || 15 || || 7 || 16 || 15 || 15 || 15.333 || || 8 || 15 || 15 || 15 || 15 || || mean \\ across \\ the time || 12.625 || 12.5 || 12.5 || 12.542 || demo1.within.only.aov <- aov(pulse ~ time + Error(id), data = demo1) summary(demo1.within.only.aov) > demo1.within.only.aov <- aov(pulse ~ time + Error(id), data = demo1) > summary(demo1.within.only.aov) Error: id Df Sum Sq Mean Sq F value Pr(>F) Residuals 7 155.3 22.18 Error: Within Df Sum Sq Mean Sq F value Pr(>F) time 2 0.0833 0.04167 1 0.393 Residuals 14 0.5833 0.04167 > see {{:r:repeated_measures_anova_eg.xlsx}} ===== demo 2 ===== # create data df <- data.frame(patient=rep(1:5, each=4), drug=rep(1:4, times=5), response=c(30, 28, 16, 34, 14, 18, 10, 22, 24, 20, 18, 30, 38, 34, 20, 44, 26, 28, 14, 30)) # view data df # within sujbect anova within.aov.mod <- aov(response~drug+Error(patient), data=df) > #create data > df <- data.frame(patient=rep(1:5, each=4), + drug=rep(1:4, times=5), + response=c(30, 28, 16, 34, + 14, 18, 10, 22, + 24, 20, 18, 30, + 38, 34, 20, 44, + 26, 28, 14, 30)) > > #view data > df patient drug response 1 1 1 30 2 1 2 28 3 1 3 16 4 1 4 34 5 2 1 14 6 2 2 18 7 2 3 10 8 2 4 22 9 3 1 24 10 3 2 20 11 3 3 18 12 3 4 30 13 4 1 38 14 4 2 34 15 4 3 20 16 4 4 44 17 5 1 26 18 5 2 28 19 5 3 14 20 5 4 30 > > #within sujbect anova > within.aov.mod <- aov(response~drug+Error(patient), data=df) > > > summary(within.aov.mod) Error: patient Df Sum Sq Mean Sq F value Pr(>F) Residuals 1 67.6 67.6 Error: Within Df Sum Sq Mean Sq F value Pr(>F) drug 1 11.6 11.56 0.139 0.714 Residuals 17 1412.6 83.10 The above is **WRONG**. within.aov.mod <- aov(response~factor(drug)+Error(factor(patient)), data=df) summary(within.aov.mod) > within.aov.mod <- aov(response~factor(drug)+Error(factor(patient)), data=df) > > > summary(within.aov.mod) Error: factor(patient) Df Sum Sq Mean Sq F value Pr(>F) Residuals 4 680.8 170.2 Error: Within Df Sum Sq Mean Sq F value Pr(>F) factor(drug) 3 698.2 232.7 24.76 1.99e-05 *** Residuals 12 112.8 9.4 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > ====== two way ====== demo1 <- read.csv("https://stats.idre.ucla.edu/stat/data/demo1.csv") demo1 str(demo1) ## 모든 변인이 int이므로 (숫자) factor로 바꿔야 한다 ## Convert variables to factor demo1 <- within(demo1, { group <- factor(group) time <- factor(time) id <- factor(id) }) ## 이제 pulse만 제외하고 모두 factor로 변환된 데이터 str(demo1) par(cex = .6) with(demo1, interaction.plot(time, group, pulse, ylim = c(5, 20), lty= c(1, 12), lwd = 3, ylab = "mean of pulse", xlab = "time", trace.label = "group")) demo1.aov <- aov(pulse ~ group * time + Error(id), data = demo1) summary(demo1.aov) > summary(demo1.aov) Error: id Df Sum Sq Mean Sq F value Pr(>F) group 1 155.04 155.04 3721 1.3e-09 *** Residuals 6 0.25 0.04 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: Within Df Sum Sq Mean Sq F value Pr(>F) time 2 0.0833 0.04167 1 0.397 group:time 2 0.0833 0.04167 1 0.397 Residuals 12 0.5000 0.04167 {{:pasted:20200611-142331.png?350}} ===== demo2 ===== demo2 <- read.csv("https://stats.idre.ucla.edu/stat/data/demo2.csv") ## Convert variables to factor demo2 <- within(demo2, { group <- factor(group) time <- factor(time) id <- factor(id) }) demo2 with(demo2, interaction.plot(time, group, pulse, ylim = c(10, 40), lty = c(1, 12), lwd = 3, ylab = "mean of pulse", xlab = "time", trace.label = "group")) demo2.aov <- aov(pulse ~ group * time + Error(id), data = demo2) summary(demo2.aov) {{:pasted:20200611-151520.png?350}} > demo2 <- read.csv("https://stats.idre.ucla.edu/stat/data/demo2.csv") > ## Convert variables to factor > demo2 <- within(demo2, { + group <- factor(group) + time <- factor(time) + id <- factor(id) + }) > demo2 id group pulse time 1 1 1 14 1 2 1 1 19 2 3 1 1 29 3 4 2 1 15 1 5 2 1 25 2 6 2 1 26 3 7 3 1 16 1 8 3 1 16 2 9 3 1 31 3 10 4 1 12 1 11 4 1 24 2 12 4 1 32 3 13 5 2 10 1 14 5 2 21 2 15 5 2 24 3 16 6 2 17 1 17 6 2 26 2 18 6 2 35 3 19 7 2 19 1 20 7 2 22 2 21 7 2 32 3 22 8 2 15 1 23 8 2 23 2 24 8 2 34 3 > > with(demo2, interaction.plot(time, group, pulse, + ylim = c(10, 40), lty = c(1, 12), lwd = 3, + ylab = "mean of pulse", xlab = "time", trace.label = "group")) > > demo2.aov <- aov(pulse ~ group * time + Error(id), data = demo2) > summary(demo2.aov) Error: id Df Sum Sq Mean Sq F value Pr(>F) group 1 15.04 15.04 0.836 0.396 Residuals 6 107.92 17.99 Error: Within Df Sum Sq Mean Sq F value Pr(>F) time 2 978.2 489.1 53.684 1.03e-06 *** group:time 2 1.1 0.5 0.059 0.943 Residuals 12 109.3 9.1 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > ===== demo 3 ===== demo3 <- read.csv("https://stats.idre.ucla.edu/stat/data/demo3.csv") ## Convert variables to factor demo3 <- within(demo3, { group <- factor(group) time <- factor(time) id <- factor(id) }) with(demo3, interaction.plot(time, group, pulse, ylim = c(10, 60), lty = c(1, 12), lwd = 3, ylab = "mean of pulse", xlab = "time", trace.label = "group")) demo3.aov <- aov(pulse ~ group * time + Error(id), data = demo3) summary(demo3.aov) {{:pasted:20200611-151755.png?350}} > demo3 <- read.csv("https://stats.idre.ucla.edu/stat/data/demo3.csv") > ## Convert variables to factor > demo3 <- within(demo3, { + group <- factor(group) + time <- factor(time) + id <- factor(id) + }) > > with(demo3, interaction.plot(time, group, pulse, + ylim = c(10, 60), lty = c(1, 12), lwd = 3, + ylab = "mean of pulse", xlab = "time", trace.label = "group")) > > demo3.aov <- aov(pulse ~ group * time + Error(id), data = demo3) > summary(demo3.aov) Error: id Df Sum Sq Mean Sq F value Pr(>F) group 1 2035.0 2035.0 343.1 1.6e-06 *** Residuals 6 35.6 5.9 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: Within Df Sum Sq Mean Sq F value Pr(>F) time 2 2830.3 1415.2 553.8 1.52e-12 *** group:time 2 200.3 100.2 39.2 5.47e-06 *** Residuals 12 30.7 2.6 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > ====== reference ====== * [[http://wwwstage.valpo.edu/other/dabook/ch12/c12-1.htm|Repeated measures one-way ANOVA]] by Akkelin * {{:ezdata.sav|ezdata: SPSS Data file}} * http://www.psych.utoronto.ca/courses/c1/chap14/chap14.html * https://statistics.laerd.com/statistical-guides/repeated-measures-anova-statistical-guide.php * http://rcompanion.org/handbook/I_09.html : This is an excellent example, but, difficult to swallow.