Null Hypotheses, Alternative Hypotheses, and p-Values
====== 9.1. Summarizing Your Data ======
library(MASS) # to include Cars93 data
> summary(Cars93$Manufacturer)
Acura Audi BMW Buick
2 2 1 4
Cadillac Chevrolet Chrylser Chrysler
2 8 1 2
Dodge Eagle Ford Geo
6 2 8 2
Honda Hyundai Infiniti Lexus
3 4 1 2
Lincoln Mazda Mercedes-Benz Mercury
2 5 2 2
Mitsubishi Nissan Oldsmobile Plymouth
2 4 4 1
Pontiac Saab Saturn Subaru
5 1 1 3
Suzuki Toyota Volkswagen Volvo
1 4 4 2
> summary(Cars93$MPG.city)
Min. 1st Qu. Median Mean 3rd Qu. Max.
15.00 18.00 21.00 22.37 25.00 46.00
> summary(Cars93$MPG.highway)
Min. 1st Qu. Median Mean 3rd Qu. Max.
20.00 26.00 28.00 29.09 31.00 50.00
city county state pop
Chicago Cook IL 2853114
Kenosha Kenosha WI 90352
Aurora Kane IL 171782
Elgin Kane IL 94487
Gary Lake(IN) IN 102746
Joliet Kendall IL 106221
Naperville DuPage IL 147779
Arlington Heights Cook IL 76031
Bolingbrook Will IL 70834
Cicero Cook IL 72616
Evanston Cook IL 74239
Hammond Lake(IN) IN 83048
Palatine Cook IL 67232
Schaumburg Cook IL 75386
Skokie Cook IL 63348
Waukegan Lake(IL) IL 91452
suburbs <- read.csv("suburbs.csv", head=T, sep=" ")
suburbs <- read.csv("http://commres.net/wiki/_export/code/r/general_statistics?codeblock=1", head=T, sep="\t")
> summary(suburbs)
X city county
Min. : 1.00 Arlington Heights: 1 Cook :7
1st Qu.: 4.75 Aurora : 1 Kane :2
Median : 8.50 Bolingbrook : 1 Lake(IN):2
Mean : 8.50 Chicago : 1 DuPage :1
3rd Qu.:12.25 Cicero : 1 Kendall :1
Max. :16.00 Elgin : 1 Kenosha :1
(Other) :10 (Other) :2
state pop
IL:13 Min. : 63348
IN: 2 1st Qu.: 73833
WI: 1 Median : 86700
Mean : 265042
3rd Qu.: 103615
Max. :2853114
====== Calculating Relative Frequencies ======
> mean(Cars93$MPG.city > 14) # see the summary(Cars93$MPG.city) the above = 100%, min = 15이므로
[1] 1
x <- Cars93$MPG.city # city mileage data
mean(abs(x-mean(x)) > 2*sd(x)) # fraction of observation that exceed two sd from the city mileage mean
[1] 0.03225806
====== Tabulating Factors and Creating Contingency Tables ======
> table(Cars93$Manufacturer, Cars93$Cylinders)
3 4 5 6 8 rotary
Acura 0 1 0 1 0 0
Audi 0 0 0 2 0 0
BMW 0 1 0 0 0 0
Buick 0 1 0 3 0 0
Cadillac 0 0 0 0 2 0
Chevrolet 0 3 0 3 2 0
Chrylser 0 0 0 1 0 0
Chrysler 0 1 0 1 0 0
Dodge 0 4 0 2 0 0
Eagle 0 1 0 1 0 0
Ford 0 5 0 2 1 0
Geo 1 1 0 0 0 0
Honda 0 3 0 0 0 0
Hyundai 0 4 0 0 0 0
Infiniti 0 0 0 0 1 0
Lexus 0 0 0 2 0 0
Lincoln 0 0 0 1 1 0
Mazda 0 3 0 1 0 1
Mercedes-Benz 0 1 0 1 0 0
Mercury 0 1 0 1 0 0
Mitsubishi 0 1 0 1 0 0
Nissan 0 2 0 2 0 0
Oldsmobile 0 2 0 2 0 0
Plymouth 0 1 0 0 0 0
Pontiac 0 2 0 3 0 0
Saab 0 1 0 0 0 0
Saturn 0 1 0 0 0 0
Subaru 1 2 0 0 0 0
Suzuki 1 0 0 0 0 0
Toyota 0 4 0 0 0 0
Volkswagen 0 2 1 1 0 0
Volvo 0 1 1 0 0 0
>
> attach(suburbs)
> table(city,state)
state
city IL IN WI
Arlington Heights 1 0 0
Aurora 1 0 0
Bolingbrook 1 0 0
Chicago 1 0 0
Cicero 1 0 0
Elgin 1 0 0
Evanston 1 0 0
Gary 0 1 0
Hammond 0 1 0
Joliet 1 0 0
Kenosha 0 0 1
Naperville 1 0 0
Palatine 1 0 0
Schaumburg 1 0 0
Skokie 1 0 0
Waukegan 1 0 0
>
====== Testing Categorical Variables for Independence ======
Smoke column records the students smoking habit, while the Exer column records their exercise level. The allowed values in Smoke are "Heavy", "Regul" (regularly), "Occas" (occasionally) and "Never". As for Exer, they are "Freq" (frequently), "Some" and "None".
We can tally the students smoking habit against the exercise level with the table function in R. The result is called the contingency table of the two variables.
> library(MASS) # load the MASS package
> tbl = table(survey$Smoke, survey$Exer)
> tbl # the contingency table
Freq None Some
Heavy 7 1 3
Never 87 18 84
Occas 12 3 4
Regul 9 1 7
> summary(tbl)
Number of cases in table: 236
Number of factors: 2
Test for independence of all factors:
Chisq = 5.489, df = 6, p-value = 0.4828
Chi-squared approximation may be incorrect
> chisq.test(tbl)
Pearson's Chi-squared test
data: tbl
X-squared = 5.4885, df = 6, p-value = 0.4828
> library(MASS)
> cardata <- data.frame(Cars93$Origin, Cars93$Type)
> cardata
Cars93.Origin Cars93.Type
1 non-USA Small
2 non-USA Midsize
3 non-USA Compact
4 non-USA Midsize
5 non-USA Midsize
6 USA Midsize
7 USA Large
8 USA Large
9 USA Midsize
10 USA Large
11 USA Midsize
12 USA Compact
13 USA Compact
14 USA Sporty
15 USA Midsize
16 USA Van
17 USA Van
18 USA Large
19 USA Sporty
20 USA Large
21 USA Compact
22 USA Large
23 USA Small
24 USA Small
25 USA Compact
26 USA Van
27 USA Midsize
28 USA Sporty
29 USA Small
30 USA Large
31 USA Small
32 USA Small
33 USA Compact
34 USA Sporty
35 USA Sporty
36 USA Van
37 USA Midsize
38 USA Large
39 non-USA Small
40 non-USA Sporty
41 non-USA Sporty
42 non-USA Small
43 non-USA Compact
44 non-USA Small
45 non-USA Small
46 non-USA Sporty
47 non-USA Midsize
48 non-USA Midsize
49 non-USA Midsize
50 non-USA Midsize
51 USA Midsize
52 USA Large
53 non-USA Small
54 non-USA Small
55 non-USA Compact
56 non-USA Van
57 non-USA Sporty
58 non-USA Compact
59 non-USA Midsize
60 USA Sporty
61 USA Midsize
62 non-USA Small
63 non-USA Midsize
64 non-USA Small
65 non-USA Compact
66 non-USA Van
67 non-USA Midsize
68 USA Compact
69 USA Midsize
70 USA Van
71 USA Large
72 USA Sporty
73 USA Small
74 USA Compact
75 USA Sporty
76 USA Midsize
77 USA Large
78 non-USA Compact
79 USA Small
80 non-USA Small
81 non-USA Small
82 non-USA Compact
83 non-USA Small
84 non-USA Small
85 non-USA Sporty
86 non-USA Midsize
87 non-USA Van
88 non-USA Small
89 non-USA Van
90 non-USA Compact
91 non-USA Sporty
92 non-USA Compact
93 non-USA Midsize
> cartbl <- table(cardata)
> cartbl
Cars93.Type
Cars93.Origin Compact Large Midsize Small Sporty Van
USA 7 11 10 7 8 5
non-USA 9 0 12 14 6 4
> summary(cartbl)
Number of cases in table: 93
Number of factors: 2
Test for independence of all factors:
Chisq = 14.08, df = 5, p-value = 0.01511
Chi-squared approximation may be incorrect
> chisq.test(cartbl)
Pearson's Chi-squared test
data: cartbl
X-squared = 14.08, df = 5, p-value = 0.01511
Warning message:
In chisq.test(cartbl) : 카이제곱 approximation은 정확하지 않을수도 있습니다
>
====== Calculating Quantiles (and Quartiles) of a Dataset ======
Data [[http://www.r-tutor.com/elementary-statistics/quantitative-data|faithful]]
> duration = faithful$eruptions # the eruption durations
> quantile(duration) # apply the quantile function
0% 25% 50% 75% 100%
1.6000 2.1627 4.0000 4.4543 5.1000
> quantile(faithful$eruptions, c(.025,.975))
2.5% 97.5%
1.750000 4.907425
====== Inverting a Quantile ======
> dur <- faithful$eruptions
> dur > mean(dur)
[1] TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE
[15] TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE
[29] TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE
[43] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE
[57] TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE
[71] TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
[85] TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE
[99] FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE FALSE
[113] TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE TRUE
[127] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
[141] TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE
[155] TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE
[169] FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE
[183] TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE
[197] TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE
[211] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
[225] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE
[239] TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
[253] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE
[267] TRUE TRUE FALSE TRUE FALSE TRUE
> mean(dur > mean(dur))
[1] 0.6176471
====== Converting Data to Z-Scores ======
> scale(dur)
> (dur - mean(dur)) / sd(dur)
[1] 0.09831763 -1.47873278 -0.13561152 -1.05555759
[5] 0.91575542 -0.52987412 1.06207065 0.09831763
[9] -1.34731192 0.75542196 -1.44982019 0.37605373
[13] 0.62400110 -1.52253974 1.06207065 -1.15718973
[17] -1.52253974 1.14968457 -1.65396061 0.66780805
[21] -1.47873278 -1.52253974 -0.03310324 -0.36866452
[25] 0.91575542 0.09831763 -1.33241755 0.52149282
[29] 0.31735241 0.82814151 0.71161501 0.85793024
[33] -0.10582279 0.47768586 0.30245804 -1.28861060
[37] -1.42003146 1.17859716 -1.44982019 1.13479020
[41] 0.75542196 -1.40601324 0.94554415 -1.52253974
[45] 0.91575542 -0.14962974 0.30245804 -1.21589105
[49] 1.00336933 -1.30350496 1.14968457 1.07608888
[53] -1.44982019 1.17859716 -1.53743411 1.22240411
[57] 0.20082590 -1.59525929 0.94554415 0.72650937
[61] -1.09936455 0.88684283 -1.52253974 1.14968457
[65] -1.46383842 0.79922892 0.59508851 1.06207065
[69] -1.24480364 1.06207065 0.47768586 -1.33241755
[73] 0.88684283 0.44877327 -1.31839933 1.38361371
[77] -1.28861060 0.94554415 0.34626500 0.09831763
[81] 0.56529978 0.74052760 0.53638718 -0.74890890
[85] 0.50747459 1.26621107 0.40496632 0.90173720
[89] -1.15718973 0.44877327 -1.12827714 0.74052760
[93] -1.42003146 1.16457893 -1.44982019 0.71161501
[97] 1.03315806 0.22973849 -1.42003146 1.23729848
[101] -0.88032977 0.77031633 -1.21589105 0.88684283
[105] 0.49258023 -1.42003146 1.06207065 -1.49362715
[109] 1.19349152 0.17103717 1.09098325 -1.04066323
[113] 1.23729848 0.81412328 -1.56634670 1.00336933
[117] -1.02576886 0.97445674 -1.46383842 0.81412328
[121] -0.76292713 0.50747459 0.66780805 -1.33241755
[125] 0.97445674 0.24463286 -1.37622451 0.88684283
[129] -1.06957582 1.01826370 -1.42003146 0.59508851
[133] -0.60259367 0.74052760 -1.44982019 0.78433455
[137] -1.40601324 1.26621107 -1.27459237 0.21484413
[141] 0.65291369 -1.09936455 0.91575542 1.16457893
[145] 0.74052760 -1.31839933 1.00336933 -1.28861060
[149] 1.41252630 -1.47873278 1.35382498 0.44877327
[153] -0.95304931 0.97445674 0.06940504 0.44877327
[157] 0.88684283 0.52149282 -1.47873278 0.41986068
[161] -1.12827714 0.58019414 -1.30350496 0.30245804
[165] 0.01070371 0.95956238 -0.98196191 1.32491239
[169] -1.36220628 0.98935111 -1.37622451 -1.23078541
[173] 0.95956238 -0.13561152 0.59508851 0.74052760
[177] 0.88684283 -0.93815495 0.44877327 0.59508851
[181] -1.40601324 0.95956238 0.66780805 0.24463286
[185] -1.27459237 0.82814151 0.52149282 -1.44982019
[189] 0.81412328 -1.14317150 1.14968457 -1.44982019
[193] 1.14968457 0.53638718 0.41898454 0.65291369
[197] 0.01070371 0.76944019 -1.08447018 1.03315806
[201] -1.21589105 0.75542196 0.56529978 -1.42003146
[205] 0.97445674 -1.49362715 0.77031633 0.31735241
[209] -1.36220628 0.88684283 -0.96794368 1.06207065
[213] -1.42003146 0.30245804 -0.06201583 0.65291369
[217] -0.95304931 1.14968457 -1.30350496 0.58019414
[221] -1.42003146 0.68270242 -1.52253974 0.87194847
[225] 0.44877327 0.55128155 0.52149282 0.68270242
[229] 0.37605373 0.93064979 0.52149282 -0.93815495
[233] 0.60910673 -1.11338277 0.84303588 -1.40601324
[237] -1.43492583 0.69672064 0.40496632 -1.01175064
[241] 0.58019414 -0.99685627 1.26621107 -0.51497976
[245] 0.95956238 0.30245804 -1.23078541 0.77031633
[249] -1.18697846 0.75542196 -1.12827714 0.84303588
[253] 0.06940504 0.88684283 0.58019414 0.28843981
[257] 0.37605373 0.84303588 -1.30350496 0.69672064
[261] 1.12077198 0.91575542 -1.43492583 0.66780805
[265] -1.31839933 -1.08447018 1.10587761 0.55128155
[269] -1.17208409 0.81412328 -1.46383842 0.85793024
>
> zdur <- (dur - mean(dur)) / sd(dur)
> mean(zdur)
[1] 8.972251e-17
> round(mean(zdur))
[1] 0
> round(sd(zdur))
[1] 1
====== Testing the Mean of a Sample (t Test) ======
> x <- rnorm(50, mean=100, sd=15)
> x
[1] 131.29017 97.35285 68.55689 119.24865 114.97441
[6] 110.92271 87.44801 107.84821 96.40073 94.05540
[11] 103.92445 108.29920 103.30896 90.64378 101.93417
[16] 104.98465 104.78447 101.35980 132.19438 93.98066
[21] 66.58195 88.89819 99.72429 67.95182 72.04780
[26] 59.89571 110.21253 93.68151 94.66022 109.50416
[31] 79.13363 120.83159 84.41475 89.10295 112.79365
[36] 97.52189 106.10858 69.67159 99.79406 91.11620
[41] 112.55720 86.77234 75.10422 122.06707 70.24902
[46] 101.42973 106.64096 76.63938 67.97055 93.60273
> t.test(x, mu=95)
One Sample t-test
data: x
t = 0.40834, df = 49, p-value = 0.6848
alternative hypothesis: true mean is not equal to 95
95 percent confidence interval:
91.0636 100.9441
sample estimates:
mean of x
96.00386
> t.test(x, mu=100)
One Sample t-test
data: x
t = 0.3372, df = 49, p-value = 0.7374
alternative hypothesis: true mean is not equal to 100
95 percent confidence interval:
97.16167 103.98297
sample estimates:
mean of x
100.5723
My students who was taking my method of learning will be different from the population whose mean is 60.
a = c(65, 78, 88, 55, 48, 95, 66, 57, 79, 81)
> t.test(a, mu=60)
One Sample t-test
data: a
t = 2.3079, df = 9, p-value = 0.0464
alternative hypothesis: true mean is not equal to 60
95 percent confidence interval:
60.22187 82.17813
sample estimates:
mean of x
71.2
qt(0.975, 9)
[1] 2.262157
====== Forming a Confidence Interval for a Mean ======
> set.seed(1024)
> x <- rnorm(50, mean=100, sd=15)
> s <- sd(x)
> m <- mean(x)
> n <- length(x)
> n
[1] 50
> m
[1] 96.00386
> s
[1] 17.38321
> SE <- s / sqrt(n)
> SE
[1] 2.458358
## qt fun: qt(prob, df) zscore 2점에 해당하는 점수는?
> qtv <- qt(.975, df=n-1)
> qtv
[1]
## qtv는 2에 해당하는 95퍼센트 CL
## 이 때의 CI는
> E <- qtv*SE
> E
[1] 4.940254
> m + c(-E, E)
[1] 91.0636 100.9441
>
> t.test(x, mu=98)
One Sample t-test
data: x
t = 0.37089, df = 49, p-value = 0.7123
alternative hypothesis: true mean is not equal to 98
95 percent confidence interval:
94.32303 103.34143
sample estimates:
mean of x
98.83223
> t.test(x, mu=100)
One Sample t-test
data: x
t = -0.52043, df = 49, p-value = 0.6051
alternative hypothesis: true mean is not equal to 100
95 percent confidence interval:
94.32303 103.34143
sample estimates:
mean of x
98.83223
> t.test(x, mu=95)
One Sample t-test
data: x
t = 1.7079, df = 49, p-value = 0.09399
alternative hypothesis: true mean is not equal to 95
95 percent confidence interval:
94.32303 103.34143
sample estimates:
mean of x
98.83223
>
====== Testing for Normality ======
> shapiro.test(x)
Shapiro-Wilk normality test
data: x
W = 0.97415, p-value = 0.3386
The large p-value suggests the underlying population could be normally distributed. The next example reports a small p-value for y, so it is unlikely that this sample came from a normal population:
normal distribution assumed -> var.equal=T
normal distribution not assumed -> var.equal=F
====== Comparing the Means of Two Samples ======
> mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
>
> mtcars$mpg
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
> mtcars$am # 0 = auto 1 = manual
[1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1
>
L = mtcars$am == 0
mpg.auto = mtcars[L,]$mpg
mpg.auto # automatic transmission mileage
[1] 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 21.5 15.5 15.2
[18] 13.3 19.2
mpg.manual = mtcars[!L,]$mpg
mpg.manual # manual transmission mileage
[1] 21.0 21.0 22.8 32.4 30.4 33.9 27.3 26.0 30.4 15.8 19.7 15.0 21.4
t.test(mpg.auto, mpg.manual)
Welch Two Sample t-test
data: mpg.auto and mpg.manual
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.280194 -3.209684
sample estimates:
mean of x mean of y
17.14737 24.39231
OR
> t.test(mtcars$mpg~mtcars$am)
Welch Two Sample t-test
data: mtcars$mpg by mtcars$am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.280194 -3.209684
sample estimates:
mean in group 0 mean in group 1
17.14737 24.39231
Another eg.
> a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179)
> b = c(185, 169, 173, 173, 188, 186, 175, 174, 179, 180)
> t.test(a,b, var.equal=TRUE, paired=FALSE)
Two Sample t-test
data: a and b
t = -0.9474, df = 18, p-value = 0.356
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-10.93994 4.13994
sample estimates:
mean of x mean of y
174.8 178.2
> qt(0.975, 18)
[1] 2.100922
> var.test(a,b)
F test to compare two variances
data: a and b
F = 2.1028, num df = 9, denom df = 9, p-value = 0.2834
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.5223017 8.4657950
sample estimates:
ratio of variances
2.102784
> qf(0.95, 9, 9)
[1] 3.178893
Tabulated value of F for alpha = 0.05,
degrees of freedom of numerator = 9,
and degrees of freedom of denominator = 9,
using the function qf(p, df.num, df.den):
===== e.g., =====
> sleep
> extra group ID
> 1 0.7 1 1
> 2 -1.6 1 2
> 3 -0.2 1 3
> 4 -1.2 1 4
> 5 -0.1 1 5
> 6 3.4 1 6
> 7 3.7 1 7
> 8 0.8 1 8
> 9 0.0 1 9
> 10 2.0 1 10
> 11 1.9 2 1
> 12 0.8 2 2
> 13 1.1 2 3
> 14 0.1 2 4
> 15 -0.1 2 5
> 16 4.4 2 6
> 17 5.5 2 7
> 18 1.6 2 8
> 19 4.6 2 9
> 20 3.4 2 10
> sleep_wide <- data.frame(
ID=1:10,
group1=sleep$extra[1:10],
group2=sleep$extra[11:20]
)
sleep_wide
> ID group1 group2
> 1 1 0.7 1.9
> 2 2 -1.6 0.8
> 3 3 -0.2 1.1
> 4 4 -1.2 0.1
> 5 5 -0.1 -0.1
> 6 6 3.4 4.4
> 7 7 3.7 5.5
> 8 8 0.8 1.6
> 9 9 0.0 4.6
> 10 10 2.0 3.4
Ignore the ID variable for a convenience.
# Welch t-test
t.test(extra ~ group, sleep)
>
> Welch Two Sample t-test
>
> data: extra by group
> t = -1.8608, df = 17.776, p-value = 0.07939
> alternative hypothesis: true difference in means is not equal to 0
> 95 percent confidence interval:
> -3.3654832 0.2054832
> sample estimates:
> mean in group 1 mean in group 2
> 0.75 2.33
# Same for wide data (two separate vectors)
> t.test(sleep_wide$group1, sleep_wide$group2)
t.test does not assume equal variances, which means it tests Welch's Two sample t-test. In this case, the df = df=17.776. To use Student’s t-test, which assumes equal variance between the two group, use **set var.equal=TRUE**. In this case, df = 18 (n-2).
# Student t-test
> t.test(extra ~ group, sleep, var.equal=TRUE)
>
> Two Sample t-test
>
> data: extra by group
> t = -1.8608, df = 18, p-value = 0.07919
> alternative hypothesis: true difference in means is not equal to 0
> 95 percent confidence interval:
> -3.363874 0.203874
> sample estimates:
> mean in group 1 mean in group 2
> 0.75 2.33
# Same for wide data (two separate vectors)
> t.test(sleep_wide$group1, sleep_wide$group2, var.equal=TRUE)
__Paired-sample t-test__
You can also compare paired data, using a paired-sample t-test. You might have observations before and after a treatment, or of two matched subjects with different treatments.
# Sort by group then ID
> sleep <- sleep[order(sleep$group, sleep$ID), ]
# Paired t-test
> t.test(extra ~ group, sleep, paired=TRUE)
Paired t-test
data: extra by group
t = -4.0621, df = 9, p-value = 0.002833
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.4598858 -0.7001142
sample estimates:
mean of the differences
-1.58
# Same for wide data (two separate vectors)
> t.test(sleep.wide$group1, sleep.wide$group2, paired=TRUE)
Paired t-test
data: sleep_wide$group1 and sleep_wide$group2
t = -4.0621, df = 9, p-value = 0.002833
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.4598858 -0.7001142
sample estimates:
mean of the differences
-1.58
The paired t-test is equivalent to testing whether difference between each pair of observations has a population mean of 0. (See below for comparing a single group to a population mean.)
> t.test(sleep_wide$group1 - sleep_wide$group2, mu=0, var.equal=TRUE)
One Sample t-test
data: sleep_wide$group1 - sleep_wide$group2
t = -4.0621, df = 9, p-value = 0.002833
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-2.4598858 -0.7001142
sample estimates:
mean of x
-1.58
__Comparing a group against an expected population mean: one-sample t-test__
Suppose that you want to test whether the data in column extra is drawn from a population whose true mean is 0. In this case, the group and ID columns are ignored.
t.test(sleep$extra, mu=0)
>
> One Sample t-test
>
> data: sleep$extra
> t = 3.413, df = 19, p-value = 0.002918
> alternative hypothesis: true mean is not equal to 0
> 95 percent confidence interval:
> 0.5955845 2.4844155
> sample estimates:
> mean of x
> 1.54
====== Paired t-test ======
repeated measure
> library(MASS) # load the MASS package
> head(immer)
Loc Var Y1 Y2
1 UF M 81.0 80.7
2 UF S 105.4 82.3
3 UF V 119.7 80.4
4 UF T 109.7 87.2
5 UF P 98.3 84.2
6 W M 146.6 100.4
> t.test(immer$Y1, immer$Y2, paired=TRUE)
Paired t-test
data: immer$Y1 and immer$Y2
t = 3.324, df = 29, p-value = 0.002413
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
6.121954 25.704713
sample estimates:
mean of the differences
15.91333