User Tools

Site Tools


c:ma:2016:schedule:week09_answer

과제제출은 가설만들기 등의 제시에 대한 답과 r의 명령어와 아웃풋, 그리고 이에 대한 해석을 포함해야 합니다.

E.g. 1

MASS data의 Cars93 data에서 Origin에 따른 city Mileage와 highway Mileage, Engine size를 비교하라.

  1. 가설 만들기:
    • $\text{MPG.city: } \bar{X}_{\text{USA}} \ne \bar{X}_{\text{nonUSA}}$
    • $\text{MPG.highway: } \bar{X}_{\text{USA}} \ne \bar{X}_{\text{nonUSA}}$
    • $\text{EnginSize: } \bar{X}_{\text{USA}} \ne \bar{X}_{\text{nonUSA}}$
  2. 영가설 만들기
    • $\text{MPG.city: } \bar{X}_{\text{USA}} = \bar{X}_{\text{nonUSA}}$
    • $\text{MPG.highway: } \bar{X}_{\text{USA}} = \bar{X}_{\text{nonUSA}}$
    • $\text{EnginSize: } \bar{X}_{\text{USA}} = \bar{X}_{\text{nonUSA}}$
  3. 각 그룹의 평균과 표준편차
  4. 가설 테스트
  5. 테스트 결과
> CarData <- subset(Cars93, select = c(Origin, MPG.city, MPG.highway, EngineSize))
> CarData
    Origin MPG.city MPG.highway EngineSize
1  non-USA       25          31        1.8
2  non-USA       18          25        3.2
3  non-USA       20          26        2.8
4  non-USA       19          26        2.8
5  non-USA       22          30        3.5
6      USA       22          31        2.2
7      USA       19          28        3.8
8      USA       16          25        5.7
9      USA       19          27        3.8
10     USA       16          25        4.9
11     USA       16          25        4.6
12     USA       25          36        2.2
13     USA       25          34        2.2
14     USA       19          28        3.4
15     USA       21          29        2.2
16     USA       18          23        3.8
17     USA       15          20        4.3
18     USA       17          26        5.0
19     USA       17          25        5.7
20     USA       20          28        3.3
21     USA       23          28        3.0
22     USA       20          26        3.3
23     USA       29          33        1.5
24     USA       23          29        2.2
25     USA       22          27        2.5
26     USA       17          21        3.0
27     USA       21          27        2.5
28     USA       18          24        3.0
29     USA       29          33        1.5
30     USA       20          28        3.5
31     USA       31          33        1.3
32     USA       23          30        1.8
33     USA       22          27        2.3
34     USA       22          29        2.3
35     USA       24          30        2.0
36     USA       15          20        3.0
37     USA       21          30        3.0
38     USA       18          26        4.6
39 non-USA       46          50        1.0
40 non-USA       30          36        1.6
41 non-USA       24          31        2.3
42 non-USA       42          46        1.5
43 non-USA       24          31        2.2
44 non-USA       29          33        1.5
45 non-USA       22          29        1.8
46 non-USA       26          34        1.5
47 non-USA       20          27        2.0
48 non-USA       17          22        4.5
49 non-USA       18          24        3.0
50 non-USA       18          23        3.0
51     USA       17          26        3.8
52     USA       18          26        4.6
53 non-USA       29          37        1.6
54 non-USA       28          36        1.8
55 non-USA       26          34        2.5
56 non-USA       18          24        3.0
57 non-USA       17          25        1.3
58 non-USA       20          29        2.3
59 non-USA       19          25        3.2
60     USA       23          26        1.6
61     USA       19          26        3.8
62 non-USA       29          33        1.5
63 non-USA       18          24        3.0
64 non-USA       29          33        1.6
65 non-USA       24          30        2.4
66 non-USA       17          23        3.0
67 non-USA       21          26        3.0
68     USA       24          31        2.3
69     USA       23          31        2.2
70     USA       18          23        3.8
71     USA       19          28        3.8
72     USA       23          30        1.8
73     USA       31          41        1.6
74     USA       23          31        2.0
75     USA       19          28        3.4
76     USA       19          27        3.4
77     USA       19          28        3.8
78 non-USA       20          26        2.1
79     USA       28          38        1.9
80 non-USA       33          37        1.2
81 non-USA       25          30        1.8
82 non-USA       23          30        2.2
83 non-USA       39          43        1.3
84 non-USA       32          37        1.5
85 non-USA       25          32        2.2
86 non-USA       22          29        2.2
87 non-USA       18          22        2.4
88 non-USA       25          33        1.8
89 non-USA       17          21        2.5
90 non-USA       21          30        2.0
91 non-USA       18          25        2.8
92 non-USA       21          28        2.3
93 non-USA       20          28        2.4
> 
> sapply(CarData, summary, na.rm=)
$Origin
    USA non-USA 
     48      45 

$MPG.city
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  15.00   18.00   21.00   22.37   25.00   46.00 

$MPG.highway
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  20.00   26.00   28.00   29.09   31.00   50.00 

$EngineSize
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.800   2.400   2.668   3.300   5.700 
>
> attach(CarData)
> tapply(CarData$MPG.city, CarData$Origin, summary)
$USA
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  15.00   18.00   20.00   20.96   23.00   31.00 

$`non-USA`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  17.00   19.00   22.00   23.87   26.00   46.00 

> tapply(MPG.city, Origin, sd)
     USA  non-USA 
3.994455 6.672876 

> plot(MPG.city~Origin)

> t.test(MPG.city~Origin)

	Welch Two Sample t-test

data:  MPG.city by Origin
t = -2.5296, df = 71.024, p-value = 0.01364
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -5.2008385 -0.6158282
sample estimates:
    mean in group USA mean in group non-USA 
             20.95833              23.86667 

> 
> t.test(MPG.city~Origin, var.equal=TRUE)

	Two Sample t-test

data:  MPG.city by Origin
t = -2.5688, df = 91, p-value = 0.01183
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -5.1572298 -0.6594368
sample estimates:
    mean in group USA mean in group non-USA 
             20.95833              23.86667 

> 
> tapply(MPG.highway, Origin, summary)
$USA
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  20.00   26.00   28.00   28.15   30.00   41.00 

$`non-USA`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  21.00   25.00   30.00   30.09   33.00   50.00 

> 

> tapply(MPG.highway, Origin, sd)
     USA  non-USA 
4.151337 6.247990 
> plot(MPG.highway~Origin)

> t.test(MPG.highway~Origin)

	Welch Two Sample t-test

data:  MPG.highway by Origin
t = -1.7545, df = 75.802, p-value = 0.08339
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4.1489029  0.2627918
sample estimates:
    mean in group USA mean in group non-USA 
             28.14583              30.08889 
> tapply(EngineSize, Origin, summary)
$USA
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.300   2.200   3.000   3.067   3.800   5.700 

$`non-USA`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.600   2.200   2.242   2.800   4.500 

> tapply(EngineSize, Origin, sd)
      USA   non-USA 
1.1353757 0.7171563 
> plot(EngineSize~Origin)
> 

> t.test(EngineSize~Origin)

	Welch Two Sample t-test

data:  EngineSize by Origin
t = 4.2135, df = 80.033, p-value = 6.55e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.4350602 1.2138287
sample estimates:
    mean in group USA mean in group non-USA 
             3.066667              2.242222 

> 

E.g. 2

  1. Seatbelts 데이터를 불러온 후
  2. seatbelt 법령이 지정되기 전과 후의 드라이버 사망률을 비교하시오.
    1. hypothesis
    2. null hypothesis
    3. test result
> sb <- as.data.frame(Seatbelts)
> attach(sb)
The following objects are masked from sb (pos = 3):

    drivers, DriversKilled, front, kms, law,
    PetrolPrice, rear, VanKilled

The following object is masked from package:MASS:

    drivers
>
> tapply(DriversKilled,law,summary)
$`0`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   79.0   108.0   121.0   125.9   140.0   198.0 

$`1`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   60.0    85.0    92.0   100.3   119.0   154.0 
>

> tapply(DriversKilled,law,sd)
       0        1 
24.26088 22.22860 
> t.test(DriversKilled~law)

	Welch Two Sample t-test

data:  DriversKilled by law
t = 5.1253, df = 29.609, p-value = 1.693e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 15.39892 35.81899
sample estimates:
mean in group 0 mean in group 1 
       125.8698        100.2609 

E.g. 3

  1. anorexia 데이터를 설명하시오.
  2. FT (family Treatment)만을 추출하여 (subset function 이용) PreWT와 PostWT를 비교하시오.
    1. 가설을 만들고
    2. 테스트를 한 후
    3. 결과를 보고하시오.
>anorexia

. . . . 

> md = subset(anorexia, Treat=="FT")
> md
   Treat Prewt Postwt
56    FT  83.8   95.2
57    FT  83.3   94.3
58    FT  86.0   91.5
59    FT  82.5   91.9
60    FT  86.7  100.3
61    FT  79.6   76.7
62    FT  76.9   76.8
63    FT  94.2  101.6
64    FT  73.4   94.9
65    FT  80.5   75.2
66    FT  81.6   77.8
67    FT  82.1   95.5
68    FT  77.6   90.7
69    FT  83.5   92.5
70    FT  89.9   93.8
71    FT  86.0   91.7
72    FT  87.3   98.0

> t.test(md$Prewt, md$Postwt, data=md, paired=TRUE)

	Paired t-test

data:  md$Prewt and md$Postwt
t = -4.1849, df = 16, p-value = 0.0007003
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -10.94471  -3.58470
sample estimates:
mean of the differences 
              -7.264706 

E.g. 4

A: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179
B: 185, 169, 173, 173, 188, 186, 175, 174, 179, 180

두 그룹의 평균의 차이를 비교하시오.

> a
 [1] 175 168 168 190 156 181 182 175 174 179
> b
 [1] 185 169 173 173 188 186 175 174 179 180
> ab <- data.frame(a,b)
> ab
     a   b
1  175 185
2  168 169
3  168 173
4  190 173
5  156 188
6  181 186
7  182 175
8  175 174
9  174 179
10 179 180
> 

> summary(ab)
       a               b        
 Min.   :156.0   Min.   :169.0  
 1st Qu.:169.5   1st Qu.:173.2  
 Median :175.0   Median :177.0  
 Mean   :174.8   Mean   :178.2  
 3rd Qu.:180.5   3rd Qu.:183.8  
 Max.   :190.0   Max.   :188.0  

> abs <- stack(ab)
> tapply(abs$values, abs$ind, summary)
$a
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  156.0   169.5   175.0   174.8   180.5   190.0 

$b
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  169.0   173.2   177.0   178.2   183.8   188.0 

> tapply(abs$values, abs$ind, sd)
       a        b 
9.342852 6.442912 
> 

> t.test(ab$a,ab$b)

	Welch Two Sample t-test

data:  ab$a and ab$b
t = -0.94737, df = 15.981, p-value = 0.3576
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.008795   4.208795
sample estimates:
mean of x mean of y 
    174.8     178.2 

E.g. 5

아래는 9개의 특정 공장에서 추출한 아이스크림에서 발견된 박테리아 숫자이다(MPN/g):

0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418

아이스크림의 박테리아가 0.3 MPN/g 보다 커서 유통되기에 위험하다고 할 수 있을까?

> ir <- c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418)
> ir
[1] 0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418

> t.test(ir, mu=.3)

	One Sample t-test

data:  ir
t = 2.2051, df = 8, p-value = 0.05853
alternative hypothesis: true mean is not equal to 0.3
95 percent confidence interval:
 0.2928381 0.6200508
sample estimates:
mean of x 
0.4564444 

> 

> t.test(ir, alternative="greater", mu=.3)

	One Sample t-test

data:  ir
t = 2.2051, df = 8, p-value = 0.02927
alternative hypothesis: true mean is greater than 0.3
95 percent confidence interval:
 0.3245133       Inf
sample estimates:
mean of x 
0.4564444 

> 

E.g. 6

아래는 흡연/비흡연자 그룹의 기억력 테스트의 결과이다.
비흡연자 = 18,22,21,17,20,17,23,20,22,21
흡연자 = 16,20,14,21,20,18,13,15,17,21
흡연이 기억에 영향을 준다고 할 수 있을까?

> smoke <- c(18,22,21,17,20,17,23,20,22,21)
> nosmoke <- c(16,20,14,21,20,18,13,15,17,21)

> sn <- data.frame(smoke, nosmoke)
> ss <- stack(sn)
> plot(ss$values~ss$ind)
> t.test(values$ss~ind$ss)

	Welch Two Sample t-test

data:  ss$values by ss$ind
t = -2.2573, df = 16.376, p-value = 0.03798
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -5.0371795 -0.1628205
sample estimates:
mean in group nosmoke   mean in group smoke 
                 17.5                  20.1 

> 

> 

E.g. 7

  1. MASS package를 불러온 후, survey 데이터를 활용하여 담배와 운동량 간의 관계에 대한 가설테스트를 하시오.
  2. 운동량의 데이터를 자주하는 그룹과 (freq) 가끔에서 전혀하지 않는 그룹(none to some)의 두 그룹으로 재 조정하여 가설 테스트를 하면 어떻게 되는가?

E.g. 8

  1. 위의 데이터에서 성별 간에 흡연정도에 차이가 있을까?
  2. 흡연 데이터를 흡연자/비흡연자로 나누어서 보면 성별 간에 흡연의 차이가 있을까?
c/ma/2016/schedule/week09_answer.txt · Last modified: 2016/11/09 09:52 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki