COMMunication
RESearch.NET

This is an old revision of the document!

1.
write.csv(Orange, “Orange.csv”)

2.

suburbs ← read.csv(“https://goo.gl/0EsHke”, sep = “\t”)
summary(suburbs)
              city         county  state  
Arlington.Heights: 1 Cook :7 IL:13

Aurora : 1 Kane :2 IN: 2
Bolingbrook : 1 Lake(IN):2 WI: 1
Chicago : 1 DuPage :1
Cicero : 1 Kendall :1
Elgin : 1 Kenosha :1
(Other) :10 (Other) :2

pop

Min. : 63348
1st Qu.: 73833
Median : 86700
Mean : 265042
3rd Qu.: 103615
Max. :2853114

-summary를 이용한 결과, state는 3 종류가 있습니다.

3.

library(MASS)
Cars93_over30 ← subset(Cars93, subset = c(Cars93$MPG.city >= 30))

4.

Cars93_Origin ← split(Cars93, Cars93$Origin)

5.

choose(50,3)

[1] 19600

6.

rnorm(1, mean = 100, sd = 15)

[1] 94.57244

7-9.

A ← c(19, 20, 24, 30, 31, 32, 30, 27, 22, 25)
B ← c(23, 22, 15, 16, 18, 12, 16, 19, 14, 25)

AB ← data.frame(A,B)
AB
  A  B
1 19 23

2 20 22
3 24 15
4 30 16
5 31 18
6 32 12
7 30 16
8 27 19
9 22 14
10 25 25

SAB ← stack(AB)

SAB
 values ind
1 19 A

2 20 A
3 24 A
4 30 A
5 31 A
6 32 A
7 30 A
8 27 A
9 22 A
10 25 A
11 23 B
12 22 B
13 15 B
14 16 B
15 18 B
16 12 B
17 16 B
18 19 B
19 14 B
20 25 B

summary(SAB)
   values     ind   
Min. :12.0 A:10

1st Qu.:17.5 B:10
Median :22.0
Mean :22.0
3rd Qu.:25.5
Max. :32.0

7. 독립변인: 이미지 연상 이용 유무
8. 종속변인: 기억한 단어의 수
9. 종류
10. 숫자
11. 이미지 연상 이용 유무에 따라 기억한 단어의 수가 다를 것이다.

12. 9
13. 9

14. 26
15. 18

tapply(SAB$values, SAB$ind, mean)

A B
26 18

tapply(SAB$values, SAB$ind, var)
     A        B 
22.22222 17.77778

SS = var. df이므로

16.
22.22222 * 9

17.
17.77778 * 9

18.

t.test(values~ind, var.eqaul = T)

Welch Two Sample t-test

data: values by ind
t = 4, df = 17.78, p-value = 0.0008577
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:

3.794434 12.205566

sample estimates:
mean in group A mean in group B

           26              18

19.
t = 4

20.

a1 ← rnorm(20, mean = 200, sd = 15)
a2 ← rnorm(20, mean = 190, sd = 15)
a ← data.frame(a1,a2)

t.test(a$a1, a$a2)

Welch Two Sample t-test

data: a$a1 and a$a2
t = 0.63759, df = 36.79, p-value = 0.5277
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.002291 11.512736
sample estimates:
mean of x mean of y
194.3748 191.6196

-영가설: 두 집단 간의 평균에 차이가 없다.
-두 집단 간 평균에 대한 t test를 시행했을 때, p-value가 0.5277로 0.05보다 크기 때문에 영가설을 기각할 수 없다. (유의수준: 95%)
-따라서, 두 집단 간의 평균에 차이가 있다고 할 수 없다.

21.

t.test(a1, mu = 190)

One Sample t-test

data: a1
t = 1.5823, df = 19, p-value = 0.1301
alternative hypothesis: true mean is not equal to 190
95 percent confidence interval:
188.5881 200.1615
sample estimates:
mean of x
194.3748

-영가설: a1은 평균이 190이다.
-t-test 결과, p-value가 0.1301로 0.05보다 크기 때문에 영가설을 기각할 수 없다. (유의수준: 95%)
-따라서 a1의 평균이 모집단 평균(190)과 차이가 있다고 할 수 없다.

22.

a3 ← rnorm(1000, mean = 200, sd = 15)
t.test(a3, mu = 190)

One Sample t-test

data: a3
t = 20.307, df = 999, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 190
95 percent confidence interval:
198.9785 200.8994
sample estimates:
mean of x
199.9389

-영가설: a3는 평균이 190이다.
-t-test 결과, p-value < 0.05이므로 영가설을 기각한다. (유의수준: 95%)
-따라서 a3의 평균이 모집단 평균(190)과 차이가 있다고 할 수 있다.

23~30.

summary(InsectSprays)
   count       spray 
Min. : 0.00 A:12

1st Qu.: 3.00 B:12
Median : 7.00 C:12
Mean : 9.50 D:12
3rd Qu.:14.25 E:12
Max. :26.00 F:12

InsectSprays
 count spray
1 10 A

2 7 A
3 20 A
4 14 A
5 14 A
6 12 A
7 10 A
8 23 A
9 17 A
10 20 A
11 14 A
12 13 A
13 11 B
14 17 B
15 21 B
16 11 B
17 16 B
18 14 B
19 17 B
20 17 B
21 19 B
22 21 B
23 7 B
24 13 B
25 0 C
26 1 C
27 7 C
28 2 C
29 3 C
30 1 C
31 2 C
32 1 C
33 3 C
34 0 C
35 1 C
36 4 C
37 3 D
38 5 D
39 12 D
40 6 D
41 4 D
42 3 D
43 5 D
44 5 D
45 5 D
46 5 D
47 2 D
48 4 D
49 3 E
50 5 E
51 3 E
52 5 E
53 3 E
54 6 E
55 1 E
56 1 E
57 3 E
58 2 E
59 6 E
60 4 E
61 11 F
62 9 F
63 15 F
64 22 F
65 15 F
66 16 F
67 13 F
68 10 F
69 26 F
70 26 F
71 24 F
72 13 F

23.
InsectSpray는 spray의 종류(spray)에 따른 곤충 박멸 수(count)의 데이터 프레임이다.

24.

tapply(InsectSprays$count, InsectSprays$spray, mean)
      A         B         C         D         E 
14.500000 15.333333 2.083333 4.916667 3.500000
      F 
16.666667

tapply(InsectSprays$count, InsectSprays$spray, var)
      A         B         C         D         E 
22.272727 18.242424 3.901515 6.265152 3.000000
      F 
38.606061

25.

a ← aov(InsectSprays$count~InsectSprays$spray)
a

Call:

 aov(formula = InsectSprays$count ~ InsectSprays$spray)

Terms:

              InsectSprays$spray Residuals

Sum of Squares 2668.833 1015.167
Deg. of Freedom 5 66

Residual standard error: 3.921902
Estimated effects may be unbalanced

summary(a)
                 Df Sum Sq Mean Sq F value Pr(>F)    
InsectSprays$spray 5 2669 533.8 34.7 <2e-16 *
Residuals 66 1015 15.4
—
Signif. codes:
0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

- 영가설 벌레 구충 효과에 차이가 없다.
- F value가 0.05보다 작으므로, 영가설을 기각한다. (유의수준 : 95%)
- (df = 71)
- 따라서 벌레 구충 효과에 차이가 있다고 할 수있다.

26.
> TukeyHSD(a)
Tukey multiple comparisons of means
95% family-wise confidence level

Fit: aov(formula = InsectSprays$count ~ InsectSprays$spray)

$`InsectSprays$spray`
diff lwr upr p adj
B-A 0.8333333 -3.866075 5.532742 0.9951810
C-A -12.4166667 -17.116075 -7.717258 0.0000000
D-A -9.5833333 -14.282742 -4.883925 0.0000014
E-A -11.0000000 -15.699409 -6.300591 0.0000000
F-A 2.1666667 -2.532742 6.866075 0.7542147
C-B -13.2500000 -17.949409 -8.550591 0.0000000
D-B -10.4166667 -15.116075 -5.717258 0.0000002
E-B -11.8333333 -16.532742 -7.133925 0.0000000
F-B 1.3333333 -3.366075 6.032742 0.9603075
D-C 2.8333333 -1.866075 7.532742 0.4920707
E-C 1.4166667 -3.282742 6.116075 0.9488669
F-C 14.5833333 9.883925 19.282742 0.0000000
E-D -1.4166667 -6.116075 3.282742 0.9488669
F-D 11.7500000 7.050591 16.449409 0.0000000
F-E 13.1666667 8.467258 17.866075 0.0000000

아래의 두 그룹 간에 차이가 있다고 할 수 있다.
A, B, F
C, D, E

27. ToothGrowth는 supp와 dose에 따라 len이 어떤 영향을 받는지 설명하는 데이터프레임이다.(Factorial Design)
비타민의 용량(dose)
비타민 투여방법 (supp) - VC, OJ
ginea pic의 이빨길이 (len)

28.
> ToothGrowth$dose = factor(ToothGrowth$dose,
levels=c(0.5,1.0,2.0),
labels=c(“low”,“med”,“high”))
> summary(ToothGrowth)
len supp dose
Min. : 4.20 OJ:30 low :20
1st Qu.:13.07 VC:30 med :20
Median :19.25 high:20
Mean :18.81
3rd Qu.:25.27
Max. :33.90

29.
> aov.out = aov(len ~ supp * dose, data=ToothGrowth)
> summary(aov.out)
Df Sum Sq Mean Sq F value Pr(>F)
supp 1 205.3 205.3 12.317 0.000894 *

dose 1 2224.3 2224.3 133.415 < 2e-16 *
supp:dose 1 88.9 88.9 5.333 0.024631 *
Residuals 56 933.6 16.7
—
Signif. codes: 0 ‘*’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

supp의 효과와
dose의 효과가 각각 있으면
상호작용 효과 또한 존재한다.

30

TukeyHSD(aov.out, “dose”)

Tukey multiple comparisons of means
  95% family-wise confidence level

Fit: aov(formula = len ~ supp * dose, data = ToothGrowth)

$dose

         diff       lwr       upr   p adj

med-low 9.130 6.362488 11.897512 0.0e+00
high-low 15.495 12.727488 18.262512 0.0e+00
high-med 6.365 3.597488 9.132512 2.7e-06

세개의 dose 집단은 각각 다르다.
두 개의 supp 집단은 각각 다르다.