COMMunication
RESearch.NET

This is an old revision of the document!
1. 
write.csv(Orange, "Orange.csv")

2. 
> suburbs <- read.csv("https://goo.gl/0EsHke", sep = "\t")
> summary(suburbs)
                city         county  state  
 Arlington.Heights: 1   Cook    :7   IL:13  
 Aurora           : 1   Kane    :2   IN: 2  
 Bolingbrook      : 1   Lake(IN):2   WI: 1  
 Chicago          : 1   DuPage  :1          
 Cicero           : 1   Kendall :1          
 Elgin            : 1   Kenosha :1          
 (Other)          :10   (Other) :2          
      pop         
 Min.   :  63348  
 1st Qu.:  73833  
 Median :  86700  
 Mean   : 265042  
 3rd Qu.: 103615  
 Max.   :2853114  
                  
-state = 3 종류 (IL, IN, WI)

3. 
> library(MASS)
> Cars93_over30 <- subset(Cars93, subset = c(Cars93$MPG.city >= 30))

4.
> g <- split(Cars93$MPG.city, Cars93$Origin)
> sapply(g, mean)
     USA  non-USA 
20.95833 23.86667 

5. 
> choose(50,3)
[1] 19600

6. 
> rnorm(1, mean = 100, sd = 15)
[1] 94.57244

7-9.
> A <- c(19, 20, 24, 30, 31, 32, 30, 27, 22, 25)
> B <- c(23, 22, 15, 16, 18, 12, 16, 19, 14, 25)

> AB <- data.frame(A,B)
> AB
    A  B
1  19 23
2  20 22
3  24 15
4  30 16
5  31 18
6  32 12
7  30 16
8  27 19
9  22 14
10 25 25

> SAB <- stack(AB)

> SAB
   values ind
1      19   A
2      20   A
3      24   A
4      30   A
5      31   A
6      32   A
7      30   A
8      27   A
9      22   A
10     25   A
11     23   B
12     22   B
13     15   B
14     16   B
15     18   B
16     12   B
17     16   B
18     19   B
19     14   B
20     25   B

> summary(SAB)
     values     ind   
 Min.   :12.0   A:10  
 1st Qu.:17.5   B:10  
 Median :22.0         
 Mean   :22.0         
 3rd Qu.:25.5         
 Max.   :32.0 





7. 독립변인: 이미지 연상 이용 유무
8. 종속변인: 기억한 단어의 수
9. 종류
10. 숫자
11. 이미지 연상 이용 유무에 따라 기억한 단어의 수가 다를 것이다.

12. 9
13. 9

14. 26
15. 18
> tapply(SAB$values, SAB$ind, mean)
 A  B 
26 18 


> tapply(SAB$values, SAB$ind, var)
       A        B 
22.22222 17.77778 

SS = var * df 이므로

16. 
> 22.22222 * 9
[1] 200

17. 
> 17.77778  * 9
[1] 160

18.

 t.test(values~ind, var.eqaul = T)

	Welch Two Sample t-test

data:  values by ind
t = 4, df = 17.78, p-value = 0.0008577
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  3.794434 12.205566
sample estimates:
mean in group A mean in group B 
             26              18 

19. 
t = 4

20.
> a1 <- rnorm(20, mean = 200, sd = 15)
> a2 <- rnorm(20, mean = 190, sd = 15)
> a <- data.frame(a1,a2)

> t.test(a$a1, a$a2)

	Welch Two Sample t-test

data:  a$a1 and a$a2
t = 0.63759, df = 36.79, p-value = 0.5277
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -6.002291 11.512736
sample estimates:
mean of x mean of y 
 194.3748  191.6196 

-영가설: 두 집단 간의 평균에 차이가 없다.
-두 집단 간 평균에 대한 t test를 시행했을 때, p-value가 0.5277로 0.05보다 크기 때문에 영가설을 기각할 수 없다. (유의수준: 95%)
-따라서, 두 집단 간의 평균에 차이가 있다고 할 수 없다.


21.
> t.test(a1, mu = 190)

	One Sample t-test

data:  a1
t = 1.5823, df = 19, p-value = 0.1301
alternative hypothesis: true mean is not equal to 190
95 percent confidence interval:
 188.5881 200.1615
sample estimates:
mean of x 
 194.3748 

-영가설: a1은 평균이 190이다.
-t-test 결과, p-value가 0.1301로 0.05보다 크기 때문에 영가설을 기각할 수 없다. (유의수준: 95%)
-따라서 a1의 평균이 모집단 평균(190)과 차이가 있다고 할 수 없다.


22.
> a3 <- rnorm(1000, mean = 200, sd = 15)
> t.test(a3, mu = 190)

	One Sample t-test

data:  a3
t = 20.307, df = 999, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 190
95 percent confidence interval:
 198.9785 200.8994
sample estimates:
mean of x 
 199.9389 

-영가설: a3는 평균이 190이다.
-t-test 결과, p-value < 0.05이므로 영가설을 기각한다. (유의수준: 95%)
-따라서 a3의 평균이 모집단 평균(190)과 차이가 있다고 할 수 있다.

23~30.

> summary(InsectSprays)
     count       spray 
 Min.   : 0.00   A:12  
 1st Qu.: 3.00   B:12  
 Median : 7.00   C:12  
 Mean   : 9.50   D:12  
 3rd Qu.:14.25   E:12  
 Max.   :26.00   F:12  

> InsectSprays
   count spray
1     10     A
2      7     A
3     20     A
4     14     A
5     14     A
6     12     A
7     10     A
8     23     A
9     17     A
10    20     A
11    14     A
12    13     A
13    11     B
14    17     B
15    21     B
16    11     B
17    16     B
18    14     B
19    17     B
20    17     B
21    19     B
22    21     B
23     7     B
24    13     B
25     0     C
26     1     C
27     7     C
28     2     C
29     3     C
30     1     C
31     2     C
32     1     C
33     3     C
34     0     C
35     1     C
36     4     C
37     3     D
38     5     D
39    12     D
40     6     D
41     4     D
42     3     D
43     5     D
44     5     D
45     5     D
46     5     D
47     2     D
48     4     D
49     3     E
50     5     E
51     3     E
52     5     E
53     3     E
54     6     E
55     1     E
56     1     E
57     3     E
58     2     E
59     6     E
60     4     E
61    11     F
62     9     F
63    15     F
64    22     F
65    15     F
66    16     F
67    13     F
68    10     F
69    26     F
70    26     F
71    24     F
72    13     F

23. 
InsectSpray는 spray의 종류(spray)에 따른 곤충 박멸 수(count)의 데이터 프레임이다.

24.
> tapply(InsectSprays$count, InsectSprays$spray, mean)
        A         B         C         D         E 
14.500000 15.333333  2.083333  4.916667  3.500000 
        F 
16.666667 

> tapply(InsectSprays$count, InsectSprays$spray, var)
        A         B         C         D         E 
22.272727 18.242424  3.901515  6.265152  3.000000 
        F 
38.606061 


25.
> a <- aov(InsectSprays$count~InsectSprays$spray)
> a
Call:
   aov(formula = InsectSprays$count ~ InsectSprays$spray)

Terms:
                InsectSprays$spray Residuals
Sum of Squares            2668.833  1015.167
Deg. of Freedom                  5        66

Residual standard error: 3.921902
Estimated effects may be unbalanced

> summary(a)
                   Df Sum Sq Mean Sq F value Pr(>F)    
InsectSprays$spray  5   2669   533.8    34.7 <2e-16 ***
Residuals          66   1015    15.4                   
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

- 영가설 벌레 구충 효과에 차이가 없다.
- F value가 0.05보다 작으므로, 영가설을 기각한다. (유의수준 : 95%)
- (df = 71)
- 따라서 벌레 구충 효과에 차이가 있다고 할 수있다.

26.
> TukeyHSD(a)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = InsectSprays$count ~ InsectSprays$spray)

$`InsectSprays$spray`
           diff        lwr       upr     p adj
B-A   0.8333333  -3.866075  5.532742 0.9951810
C-A -12.4166667 -17.116075 -7.717258 0.0000000
D-A  -9.5833333 -14.282742 -4.883925 0.0000014
E-A -11.0000000 -15.699409 -6.300591 0.0000000
F-A   2.1666667  -2.532742  6.866075 0.7542147
C-B -13.2500000 -17.949409 -8.550591 0.0000000
D-B -10.4166667 -15.116075 -5.717258 0.0000002
E-B -11.8333333 -16.532742 -7.133925 0.0000000
F-B   1.3333333  -3.366075  6.032742 0.9603075
D-C   2.8333333  -1.866075  7.532742 0.4920707
E-C   1.4166667  -3.282742  6.116075 0.9488669
F-C  14.5833333   9.883925 19.282742 0.0000000
E-D  -1.4166667  -6.116075  3.282742 0.9488669
F-D  11.7500000   7.050591 16.449409 0.0000000
F-E  13.1666667   8.467258 17.866075 0.0000000

아래의 두 그룹 간에 차이가 있다고 할 수 있다.
A, B, F
C, D, E

27. ToothGrowth는 supp와 dose에 따라 len이 어떤 영향을 받는지 설명하는 데이터프레임이다.(Factorial Design)
비타민의 용량(dose)
비타민 투여방법 (supp) - VC, OJ
ginea pic의 이빨길이 (len)

28. 
> ToothGrowth$dose = factor(ToothGrowth$dose,
                     levels=c(0.5,1.0,2.0),
                     labels=c("low","med","high"))
> summary(ToothGrowth)
      len        supp      dose   
 Min.   : 4.20   OJ:30   low :20  
 1st Qu.:13.07   VC:30   med :20  
 Median :19.25           high:20  
 Mean   :18.81                    
 3rd Qu.:25.27                    
 Max.   :33.90

29.
> aov.out = aov(len ~ supp * dose, data=ToothGrowth)
> summary(aov.out)
            Df Sum Sq Mean Sq F value   Pr(>F)    
supp         1  205.3   205.3  12.317 0.000894 ***
dose         1 2224.3  2224.3 133.415  < 2e-16 ***
supp:dose    1   88.9    88.9   5.333 0.024631 *  
Residuals   56  933.6    16.7                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

supp의 효과와 
dose의 효과가 각각 있으면 
상호작용 효과 또한 존재한다. 


30
> TukeyHSD(aov.out, "dose")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = len ~ supp * dose, data = ToothGrowth)

$dose
           diff       lwr       upr   p adj
med-low   9.130  6.362488 11.897512 0.0e+00
high-low 15.495 12.727488 18.262512 0.0e+00
high-med  6.365  3.597488  9.132512 2.7e-06

세개의 dose 집단은 각각 다르다. 
두 개의 supp 집단은 각각 다르다.