과제제출은 가설만들기 등의 제시에 대한 답과 r의 명령어와 아웃풋, 그리고 이에 대한 해석을 포함해야 합니다.
MASS data의 Cars93 data에서 Origin에 따른 city Mileage와 highway Mileage, Engine size를 비교하라.
> CarData <- subset(Cars93, select = c(Origin, MPG.city, MPG.highway, EngineSize)) > CarData Origin MPG.city MPG.highway EngineSize 1 non-USA 25 31 1.8 2 non-USA 18 25 3.2 3 non-USA 20 26 2.8 4 non-USA 19 26 2.8 5 non-USA 22 30 3.5 6 USA 22 31 2.2 7 USA 19 28 3.8 8 USA 16 25 5.7 9 USA 19 27 3.8 10 USA 16 25 4.9 11 USA 16 25 4.6 12 USA 25 36 2.2 13 USA 25 34 2.2 14 USA 19 28 3.4 15 USA 21 29 2.2 16 USA 18 23 3.8 17 USA 15 20 4.3 18 USA 17 26 5.0 19 USA 17 25 5.7 20 USA 20 28 3.3 21 USA 23 28 3.0 22 USA 20 26 3.3 23 USA 29 33 1.5 24 USA 23 29 2.2 25 USA 22 27 2.5 26 USA 17 21 3.0 27 USA 21 27 2.5 28 USA 18 24 3.0 29 USA 29 33 1.5 30 USA 20 28 3.5 31 USA 31 33 1.3 32 USA 23 30 1.8 33 USA 22 27 2.3 34 USA 22 29 2.3 35 USA 24 30 2.0 36 USA 15 20 3.0 37 USA 21 30 3.0 38 USA 18 26 4.6 39 non-USA 46 50 1.0 40 non-USA 30 36 1.6 41 non-USA 24 31 2.3 42 non-USA 42 46 1.5 43 non-USA 24 31 2.2 44 non-USA 29 33 1.5 45 non-USA 22 29 1.8 46 non-USA 26 34 1.5 47 non-USA 20 27 2.0 48 non-USA 17 22 4.5 49 non-USA 18 24 3.0 50 non-USA 18 23 3.0 51 USA 17 26 3.8 52 USA 18 26 4.6 53 non-USA 29 37 1.6 54 non-USA 28 36 1.8 55 non-USA 26 34 2.5 56 non-USA 18 24 3.0 57 non-USA 17 25 1.3 58 non-USA 20 29 2.3 59 non-USA 19 25 3.2 60 USA 23 26 1.6 61 USA 19 26 3.8 62 non-USA 29 33 1.5 63 non-USA 18 24 3.0 64 non-USA 29 33 1.6 65 non-USA 24 30 2.4 66 non-USA 17 23 3.0 67 non-USA 21 26 3.0 68 USA 24 31 2.3 69 USA 23 31 2.2 70 USA 18 23 3.8 71 USA 19 28 3.8 72 USA 23 30 1.8 73 USA 31 41 1.6 74 USA 23 31 2.0 75 USA 19 28 3.4 76 USA 19 27 3.4 77 USA 19 28 3.8 78 non-USA 20 26 2.1 79 USA 28 38 1.9 80 non-USA 33 37 1.2 81 non-USA 25 30 1.8 82 non-USA 23 30 2.2 83 non-USA 39 43 1.3 84 non-USA 32 37 1.5 85 non-USA 25 32 2.2 86 non-USA 22 29 2.2 87 non-USA 18 22 2.4 88 non-USA 25 33 1.8 89 non-USA 17 21 2.5 90 non-USA 21 30 2.0 91 non-USA 18 25 2.8 92 non-USA 21 28 2.3 93 non-USA 20 28 2.4 > > sapply(CarData, summary, na.rm=) $Origin USA non-USA 48 45 $MPG.city Min. 1st Qu. Median Mean 3rd Qu. Max. 15.00 18.00 21.00 22.37 25.00 46.00 $MPG.highway Min. 1st Qu. Median Mean 3rd Qu. Max. 20.00 26.00 28.00 29.09 31.00 50.00 $EngineSize Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 1.800 2.400 2.668 3.300 5.700 >
> attach(CarData) > tapply(CarData$MPG.city, CarData$Origin, summary) $USA Min. 1st Qu. Median Mean 3rd Qu. Max. 15.00 18.00 20.00 20.96 23.00 31.00 $`non-USA` Min. 1st Qu. Median Mean 3rd Qu. Max. 17.00 19.00 22.00 23.87 26.00 46.00 > tapply(MPG.city, Origin, sd) USA non-USA 3.994455 6.672876 > plot(MPG.city~Origin)
> t.test(MPG.city~Origin) Welch Two Sample t-test data: MPG.city by Origin t = -2.5296, df = 71.024, p-value = 0.01364 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -5.2008385 -0.6158282 sample estimates: mean in group USA mean in group non-USA 20.95833 23.86667 > > t.test(MPG.city~Origin, var.equal=TRUE) Two Sample t-test data: MPG.city by Origin t = -2.5688, df = 91, p-value = 0.01183 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -5.1572298 -0.6594368 sample estimates: mean in group USA mean in group non-USA 20.95833 23.86667 >
> tapply(MPG.highway, Origin, summary) $USA Min. 1st Qu. Median Mean 3rd Qu. Max. 20.00 26.00 28.00 28.15 30.00 41.00 $`non-USA` Min. 1st Qu. Median Mean 3rd Qu. Max. 21.00 25.00 30.00 30.09 33.00 50.00 > > tapply(MPG.highway, Origin, sd) USA non-USA 4.151337 6.247990 > plot(MPG.highway~Origin)
> t.test(MPG.highway~Origin) Welch Two Sample t-test data: MPG.highway by Origin t = -1.7545, df = 75.802, p-value = 0.08339 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -4.1489029 0.2627918 sample estimates: mean in group USA mean in group non-USA 28.14583 30.08889
> tapply(EngineSize, Origin, summary) $USA Min. 1st Qu. Median Mean 3rd Qu. Max. 1.300 2.200 3.000 3.067 3.800 5.700 $`non-USA` Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 1.600 2.200 2.242 2.800 4.500 > tapply(EngineSize, Origin, sd) USA non-USA 1.1353757 0.7171563 > plot(EngineSize~Origin) >
> t.test(EngineSize~Origin) Welch Two Sample t-test data: EngineSize by Origin t = 4.2135, df = 80.033, p-value = 6.55e-05 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.4350602 1.2138287 sample estimates: mean in group USA mean in group non-USA 3.066667 2.242222 >
> sb <- as.data.frame(Seatbelts) > attach(sb) The following objects are masked from sb (pos = 3): drivers, DriversKilled, front, kms, law, PetrolPrice, rear, VanKilled The following object is masked from package:MASS: drivers >
> tapply(DriversKilled,law,summary) $`0` Min. 1st Qu. Median Mean 3rd Qu. Max. 79.0 108.0 121.0 125.9 140.0 198.0 $`1` Min. 1st Qu. Median Mean 3rd Qu. Max. 60.0 85.0 92.0 100.3 119.0 154.0 > > tapply(DriversKilled,law,sd) 0 1 24.26088 22.22860
> t.test(DriversKilled~law) Welch Two Sample t-test data: DriversKilled by law t = 5.1253, df = 29.609, p-value = 1.693e-05 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 15.39892 35.81899 sample estimates: mean in group 0 mean in group 1 125.8698 100.2609
>anorexia . . . . > md = subset(anorexia, Treat=="FT") > md Treat Prewt Postwt 56 FT 83.8 95.2 57 FT 83.3 94.3 58 FT 86.0 91.5 59 FT 82.5 91.9 60 FT 86.7 100.3 61 FT 79.6 76.7 62 FT 76.9 76.8 63 FT 94.2 101.6 64 FT 73.4 94.9 65 FT 80.5 75.2 66 FT 81.6 77.8 67 FT 82.1 95.5 68 FT 77.6 90.7 69 FT 83.5 92.5 70 FT 89.9 93.8 71 FT 86.0 91.7 72 FT 87.3 98.0 > t.test(md$Prewt, md$Postwt, data=md, paired=TRUE) Paired t-test data: md$Prewt and md$Postwt t = -4.1849, df = 16, p-value = 0.0007003 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -10.94471 -3.58470 sample estimates: mean of the differences -7.264706
A: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179
B: 185, 169, 173, 173, 188, 186, 175, 174, 179, 180
두 그룹의 평균의 차이를 비교하시오.
> a [1] 175 168 168 190 156 181 182 175 174 179 > b [1] 185 169 173 173 188 186 175 174 179 180 > ab <- data.frame(a,b) > ab a b 1 175 185 2 168 169 3 168 173 4 190 173 5 156 188 6 181 186 7 182 175 8 175 174 9 174 179 10 179 180 > > summary(ab) a b Min. :156.0 Min. :169.0 1st Qu.:169.5 1st Qu.:173.2 Median :175.0 Median :177.0 Mean :174.8 Mean :178.2 3rd Qu.:180.5 3rd Qu.:183.8 Max. :190.0 Max. :188.0 > abs <- stack(ab) > tapply(abs$values, abs$ind, summary) $a Min. 1st Qu. Median Mean 3rd Qu. Max. 156.0 169.5 175.0 174.8 180.5 190.0 $b Min. 1st Qu. Median Mean 3rd Qu. Max. 169.0 173.2 177.0 178.2 183.8 188.0 > tapply(abs$values, abs$ind, sd) a b 9.342852 6.442912 > > t.test(ab$a,ab$b) Welch Two Sample t-test data: ab$a and ab$b t = -0.94737, df = 15.981, p-value = 0.3576 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -11.008795 4.208795 sample estimates: mean of x mean of y 174.8 178.2
아래는 9개의 특정 공장에서 추출한 아이스크림에서 발견된 박테리아 숫자이다(MPN/g):
0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418
아이스크림의 박테리아가 0.3 MPN/g 보다 커서 유통되기에 위험하다고 할 수 있을까?
> ir <- c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418) > ir [1] 0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418 > t.test(ir, mu=.3) One Sample t-test data: ir t = 2.2051, df = 8, p-value = 0.05853 alternative hypothesis: true mean is not equal to 0.3 95 percent confidence interval: 0.2928381 0.6200508 sample estimates: mean of x 0.4564444 > > t.test(ir, alternative="greater", mu=.3) One Sample t-test data: ir t = 2.2051, df = 8, p-value = 0.02927 alternative hypothesis: true mean is greater than 0.3 95 percent confidence interval: 0.3245133 Inf sample estimates: mean of x 0.4564444 >
아래는 흡연/비흡연자 그룹의 기억력 테스트의 결과이다.
비흡연자 = 18,22,21,17,20,17,23,20,22,21
흡연자 = 16,20,14,21,20,18,13,15,17,21
흡연이 기억에 영향을 준다고 할 수 있을까?
> smoke <- c(18,22,21,17,20,17,23,20,22,21) > nosmoke <- c(16,20,14,21,20,18,13,15,17,21) > sn <- data.frame(smoke, nosmoke) > ss <- stack(sn) > plot(ss$values~ss$ind)
> t.test(values$ss~ind$ss) Welch Two Sample t-test data: ss$values by ss$ind t = -2.2573, df = 16.376, p-value = 0.03798 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -5.0371795 -0.1628205 sample estimates: mean in group nosmoke mean in group smoke 17.5 20.1 > >