multicolinearity
This is an old revision of the document!
Multi-colinearity check in r
required library:
- corrplot
- mctest
- omcdiag
- imcdiag
> cps <- read.csv("http://commres.net/wiki/_media/cps_85_wages.csv", header = T, sep = "\t")
> str(cps) 'data.frame': 534 obs. of 11 variables: $ education : int 8 9 12 12 12 13 10 12 16 12 ... $ south : int 0 0 0 0 0 0 1 0 0 0 ... $ sex : int 1 1 0 0 0 0 0 0 0 0 ... $ experience: int 21 42 1 4 17 9 27 9 11 9 ... $ union : int 0 0 0 0 0 1 0 0 0 0 ... $ wage : num 5.1 4.95 6.67 4 7.5 ... $ age : int 35 57 19 22 35 28 43 27 33 27 ... $ race : int 2 3 3 3 3 3 3 3 3 3 ... $ occupation: int 6 6 6 6 6 6 6 6 6 6 ... $ sector : int 1 1 1 0 0 0 0 0 1 0 ... $ marr : int 1 1 0 0 1 0 0 0 1 0 ... > head(cps) > head(cps) education south sex experience union wage age race occupation sector marr 1 8 0 1 21 0 5.10 35 2 6 1 1 2 9 0 1 42 0 4.95 57 3 6 1 1 3 12 0 0 1 0 6.67 19 3 6 1 0 4 12 0 0 4 0 4.00 22 3 6 0 0 5 12 0 0 17 0 7.50 35 3 6 0 1 6 13 0 0 9 1 13.07 28 3 6 0 0
F-statistic: 74.91 on 1 and 58 DF, p-value: 4.939e-12
> set.seed(1)
> x1 <- rnorm(25)
> x2 <- rnorm(25, x1)
> y <- x1-x2 + rnorm(25)
> pairs( cbind(y,x1,x2) )
> cor( cbind(y,x1,x2) )
y x1 x2
y 1.00000000 0.08089276 -0.2575073
x1 0.08089276 1.00000000 0.7872474
x2 -0.25750727 0.78724740 1.0000000
> summary(lm(y~x1))
Call:
lm(formula = y ~ x1)
Residuals:
Min 1Q Median 3Q Max
-2.3178 -0.9417 0.1974 0.7032 2.6812
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1145 0.2687 0.426 0.674
x1 0.1106 0.2841 0.389 0.701
Residual standard error: 1.322 on 23 degrees of freedom
Multiple R-squared: 0.006544, Adjusted R-squared: -0.03665
F-statistic: 0.1515 on 1 and 23 DF, p-value: 0.7007
> summary(lm(y~x2))
Call:
lm(formula = y ~ x2)
Residuals:
Min 1Q Median 3Q Max
-1.88739 -0.93086 0.06246 0.58728 2.94566
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1920 0.2604 0.737 0.469
x2 -0.2927 0.2290 -1.278 0.214
Residual standard error: 1.282 on 23 degrees of freedom
Multiple R-squared: 0.06631, Adjusted R-squared: 0.02571
F-statistic: 1.633 on 1 and 23 DF, p-value: 0.214
> summary(lm(y~x1+x2))
Call:
lm(formula = y ~ x1 + x2)
Residuals:
Min 1Q Median 3Q Max
-1.94803 -0.92496 -0.03868 0.42155 2.17441
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1541 0.2347 0.657 0.5181
x1 1.0194 0.4016 2.539 0.0187 *
x2 -0.9602 0.3340 -2.875 0.0088 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.153 on 22 degrees of freedom
Multiple R-squared: 0.2779, Adjusted R-squared: 0.2122
F-statistic: 4.232 on 2 and 22 DF, p-value: 0.02785
> cor(x1,x2)
[1] 0.7872474
> cor.test(x1,x2)
Pearson's product-moment correlation
data: x1 and x2
t = 6.1227, df = 23, p-value = 3.026e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.5691639 0.9018451
sample estimates:
cor
0.7872474
> cps <- read.csv("http://commres.net/wiki/_export/code/r/data?codeblock=7", header = T, sep = "\t")
> cps
癤풽ducation south sex experience union wage age race
1 8 0 1 21 0 5.10 35 2
2 9 0 1 42 0 4.95 57 3
3 12 0 0 1 0 6.67 19 3
4 12 0 0 4 0 4.00 22 3
5 12 0 0 17 0 7.50 35 3
6 13 0 0 9 1 13.07 28 3
7 10 1 0 27 0 4.45 43 3
8 12 0 0 9 0 19.47 27 3
9 16 0 0 11 0 13.28 33 3
10 12 0 0 9 0 8.75 27 3
11 12 0 0 17 1 11.35 35 3
12 12 0 0 19 1 11.50 37 3
13 8 1 0 27 0 6.50 41 3
14 9 1 0 30 1 6.25 45 3
15 9 1 0 29 0 19.98 44 3
16 12 0 0 37 0 7.30 55 3
17 7 1 0 44 0 8.00 57 3
18 12 0 0 26 1 22.20 44 3
19 11 0 0 16 0 3.65 33 3
20 12 0 0 33 0 20.55 51 3
21 12 0 1 16 1 5.71 34 3
22 7 0 0 42 1 7.00 55 1
23 12 0 0 9 0 3.75 27 3
24 11 1 0 14 0 4.50 31 1
25 12 0 0 23 0 9.56 41 3
26 6 1 0 45 0 5.75 57 3
27 12 0 0 8 0 9.36 26 3
28 10 0 0 30 0 6.50 46 3
29 12 0 1 8 0 3.35 26 3
30 12 0 0 8 0 4.75 26 3
31 14 0 0 13 0 8.90 33 3
32 12 1 1 46 0 4.00 64 3
33 8 0 0 19 0 4.70 33 3
34 17 1 1 1 0 5.00 24 3
35 12 0 0 19 0 9.25 37 3
36 12 0 0 36 0 10.67 54 1
37 12 1 0 20 0 7.61 38 1
38 12 0 0 35 1 10.00 53 1
39 12 0 0 3 0 7.50 21 3
40 14 1 0 10 0 12.20 30 3
41 12 0 0 0 0 3.35 18 3
42 14 1 0 14 1 11.00 34 3
43 12 0 0 14 0 12.00 32 3
44 9 0 1 16 0 4.85 31 3
45 13 1 0 8 0 4.30 27 3
46 7 1 1 15 0 6.00 28 3
47 16 0 0 12 0 15.00 34 3
48 10 1 0 13 0 4.85 29 3
49 8 0 0 33 1 9.00 47 3
50 12 0 0 9 0 6.36 27 3
51 12 0 0 7 0 9.15 25 3
52 16 0 0 13 1 11.00 35 3
53 12 0 1 7 0 4.50 25 3
54 12 0 1 16 0 4.80 34 3
55 13 0 0 0 0 4.00 19 3
56 12 0 1 11 0 5.50 29 3
57 13 0 0 17 0 8.40 36 3
58 10 0 0 13 0 6.75 29 3
59 12 0 0 22 1 10.00 40 1
60 12 0 1 28 0 5.00 46 3
61 11 0 0 17 0 6.50 34 3
62 12 0 0 24 1 10.75 42 3
63 3 1 0 55 0 7.00 64 2
64 12 1 0 3 0 11.43 21 3
65 12 0 0 6 1 4.00 24 1
66 10 0 0 27 0 9.00 43 3
67 12 1 0 19 1 13.00 37 1
68 12 0 0 19 1 12.22 37 3
69 12 0 1 38 0 6.28 56 3
70 10 1 0 41 1 6.75 57 1
71 11 1 0 3 0 3.35 20 1
72 14 0 0 20 1 16.00 40 3
73 10 0 0 15 0 5.25 31 3
74 8 1 0 8 0 3.50 22 2
75 8 1 1 39 0 4.22 53 3
76 6 0 1 43 1 3.00 55 2
77 11 1 1 25 1 4.00 42 3
78 12 0 0 11 1 10.00 29 3
79 12 0 0 12 0 5.00 30 1
80 12 1 0 35 1 16.00 53 3
81 14 0 0 14 0 13.98 34 3
82 12 0 0 16 1 13.26 34 3
83 10 0 1 44 1 6.10 60 3
84 16 1 1 13 0 3.75 35 3
85 13 0 0 8 1 9.00 27 1
86 12 0 0 13 0 9.45 31 3
87 11 0 0 18 1 5.50 35 3
88 12 0 1 18 0 8.93 36 3
89 12 1 1 6 0 6.25 24 3
90 11 1 0 37 1 9.75 54 3
occupation sector marr
1 6 1 1
2 6 1 1
3 6 1 0
4 6 0 0
5 6 0 1
6 6 0 0
7 6 0 0
8 6 0 0
9 6 1 1
10 6 0 0
11 6 0 1
12 6 1 0
13 6 0 1
14 6 0 0
15 6 0 1
16 6 2 1
17 6 0 1
18 6 1 1
19 6 0 0
20 6 0 1
21 6 1 1
22 6 1 1
23 6 0 0
24 6 0 1
25 6 0 1
26 6 1 1
27 6 1 1
28 6 0 1
29 6 1 1
30 6 0 1
31 6 0 0
32 6 0 0
33 6 0 1
34 6 0 0
35 6 1 0
36 6 0 0
37 6 2 1
38 6 2 1
39 6 0 0
40 6 1 1
41 6 0 0
42 6 1 1
43 6 1 1
44 6 1 1
45 6 2 0
46 6 1 1
47 6 1 1
48 6 0 0
49 6 0 1
50 6 1 1
51 6 0 1
52 6 1 1
53 6 1 1
54 6 1 1
55 6 0 0
56 6 1 0
57 6 1 0
58 6 1 1
59 6 1 0
60 6 1 1
61 6 0 0
62 6 2 1
63 6 1 1
64 6 2 0
65 6 1 0
66 6 2 1
67 6 1 1
68 6 2 1
69 6 1 1
70 6 1 1
71 6 1 0
72 6 0 1
73 6 0 1
74 6 1 1
75 6 1 1
76 6 1 1
77 6 1 1
78 6 0 1
79 6 0 1
80 6 1 1
81 6 0 0
82 6 0 1
83 6 1 0
84 6 0 0
85 6 1 0
86 6 1 0
87 6 0 1
88 6 0 1
89 6 0 0
90 6 1 1
[ reached getOption("max.print") -- omitted 444 rows ]
> head(cps)
癤풽ducation south sex experience union wage age race
1 8 0 1 21 0 5.10 35 2
2 9 0 1 42 0 4.95 57 3
3 12 0 0 1 0 6.67 19 3
4 12 0 0 4 0 4.00 22 3
5 12 0 0 17 0 7.50 35 3
6 13 0 0 9 1 13.07 28 3
occupation sector marr
1 6 1 1
2 6 1 1
3 6 1 0
4 6 0 0
5 6 0 1
6 6 0 0
> colnames(cps) <- c("education")
> head(cps)
education NA NA NA NA NA NA NA NA NA NA
1 8 0 1 21 0 5.10 35 2 6 1 1
2 9 0 1 42 0 4.95 57 3 6 1 1
3 12 0 0 1 0 6.67 19 3 6 1 0
4 12 0 0 4 0 4.00 22 3 6 0 0
5 12 0 0 17 0 7.50 35 3 6 0 1
6 13 0 0 9 1 13.07 28 3 6 0 0
> cps <- read.csv("http://commres.net/wiki/_export/code/r/data?codeblock=7", header = T, sep = "\t")
> head(cps)
education south sex experience union wage age race
1 8 0 1 21 0 5.10 35 2
2 9 0 1 42 0 4.95 57 3
3 12 0 0 1 0 6.67 19 3
4 12 0 0 4 0 4.00 22 3
5 12 0 0 17 0 7.50 35 3
6 13 0 0 9 1 13.07 28 3
occupation sector marr
1 6 1 1
2 6 1 1
3 6 1 0
4 6 0 0
5 6 0 1
6 6 0 0
> cps2 <- read.csv("http://commres.net/wiki/_export/code/r/data?codeblock=7", header = T, sep = "\t")
> head(cps2)
education south sex experience union wage age race
1 8 0 1 21 0 5.10 35 2
2 9 0 1 42 0 4.95 57 3
3 12 0 0 1 0 6.67 19 3
4 12 0 0 4 0 4.00 22 3
5 12 0 0 17 0 7.50 35 3
6 13 0 0 9 1 13.07 28 3
occupation sector marr
1 6 1 1
2 6 1 1
3 6 1 0
4 6 0 0
5 6 0 1
6 6 0 0
> cps2 <- read.csv("http://commres.net/wiki/_media/cps_85_wages.csv", header = T, sep = "\t")
> head(cps2)
education south sex experience union wage age race
1 8 0 1 21 0 5.10 35 2
2 9 0 1 42 0 4.95 57 3
3 12 0 0 1 0 6.67 19 3
4 12 0 0 4 0 4.00 22 3
5 12 0 0 17 0 7.50 35 3
6 13 0 0 9 1 13.07 28 3
occupation sector marr
1 6 1 1
2 6 1 1
3 6 1 0
4 6 0 0
5 6 0 1
6 6 0 0
> fit_model1 = lm(log(data1$Wage) ~., data = cps)
Error in eval(predvars, data, env) : object 'data1' not found
> fit_model1 = lm(log(cps$Wage) ~., data = cps)
Error in log(cps$Wage) : non-numeric argument to mathematical function
> str(cps)
'data.frame': 534 obs. of 11 variables:
$ education : int 8 9 12 12 12 13 10 12 16 12 ...
$ south : int 0 0 0 0 0 0 1 0 0 0 ...
$ sex : int 1 1 0 0 0 0 0 0 0 0 ...
$ experience: int 21 42 1 4 17 9 27 9 11 9 ...
$ union : int 0 0 0 0 0 1 0 0 0 0 ...
$ wage : num 5.1 4.95 6.67 4 7.5 ...
$ age : int 35 57 19 22 35 28 43 27 33 27 ...
$ race : int 2 3 3 3 3 3 3 3 3 3 ...
$ occupation: int 6 6 6 6 6 6 6 6 6 6 ...
$ sector : int 1 1 1 0 0 0 0 0 1 0 ...
$ marr : int 1 1 0 0 1 0 0 0 1 0 ...
> head(cps)
education south sex experience union wage age race
1 8 0 1 21 0 5.10 35 2
2 9 0 1 42 0 4.95 57 3
3 12 0 0 1 0 6.67 19 3
4 12 0 0 4 0 4.00 22 3
5 12 0 0 17 0 7.50 35 3
6 13 0 0 9 1 13.07 28 3
occupation sector marr
1 6 1 1
2 6 1 1
3 6 1 0
4 6 0 0
5 6 0 1
6 6 0 0
> log(cps$wage)
[1] 1.6292405 1.5993876 1.8976199 1.3862944 2.0149030
[6] 2.5703195 1.4929041 2.9688748 2.5862591 2.1690537
[11] 2.4292177 2.4423470 1.8718022 1.8325815 2.9947318
[16] 1.9878743 2.0794415 3.1000923 1.2947272 3.0228609
[21] 1.7422190 1.9459101 1.3217558 1.5040774 2.2575877
[26] 1.7491999 2.2364453 1.8718022 1.2089603 1.5581446
[31] 2.1860513 1.3862944 1.5475625 1.6094379 2.2246236
[36] 2.3674361 2.0294632 2.3025851 2.0149030 2.5014360
[41] 1.2089603 2.3978953 2.4849066 1.5789787 1.4586150
[46] 1.7917595 2.7080502 1.5789787 2.1972246 1.8500284
[51] 2.2137539 2.3978953 1.5040774 1.5686159 1.3862944
[56] 1.7047481 2.1282317 1.9095425 2.3025851 1.6094379
[61] 1.8718022 2.3749058 1.9459101 2.4362415 1.3862944
[66] 2.1972246 2.5649494 2.5030740 1.8373700 1.9095425
[71] 1.2089603 2.7725887 1.6582281 1.2527630 1.4398351
[76] 1.0986123 1.3862944 2.3025851 1.6094379 2.7725887
[81] 2.6376277 2.5847520 1.8082888 1.3217558 2.1972246
[86] 2.2460147 1.7047481 2.1894164 1.8325815 2.2772673
[91] 1.9065751 2.0515563 1.0473190 1.2089603 2.9947318
[96] 2.1400662 2.2772673 2.7080502 2.0794415 2.4203681
[101] 2.6390573 2.3025851 1.8718022 2.2854389 2.9177707
[106] 2.5257286 3.2580965 2.6390573 2.3513753 2.3978953
[111] 2.5233258 2.5257286 2.7080502 1.7917595 2.2512918
[116] 1.6094379 1.3217558 2.5313130 1.9286187 1.7047481
[121] 1.9459101 1.5040774 1.8718022 2.4849066 1.6094379
[126] 1.8718022 1.9169226 2.1690537 1.3217558 1.5040774
[131] 1.7917595 1.7047481 2.5649494 1.7316555 1.5686159
[136] 1.9459101 1.6582281 1.2089603 2.1400662 1.7917595
[141] 1.9095425 2.1849270 2.6539459 2.3776926 2.1860513
[146] 2.0149030 1.5040774 2.4203681 2.5989791 1.7917595
[151] 1.5303947 2.3589654 1.6094379 2.1041342 1.8325815
[156] 2.1400662 3.2180755 2.8124102 1.8325815 1.5151272
[161] 2.4203681 3.0563569 2.5376572 2.0149030 2.3272777
[166] 1.2089603 2.5989791 1.5769147 3.2691886 1.8840347
[171] 3.7954892 2.7080502 2.4203681 1.9459101 2.3025851
[176] 2.6762155 2.9957323 3.1135153 1.2919837 2.3627390
[181] 3.2180755 1.7917595 2.9444390 2.5802168 3.1135153
[186] 2.7080502 1.9286187 2.4714836 2.7813007 2.6354795
[191] 2.5771819 1.6677068 1.5040774 2.3025851 2.3025851
[196] 2.3025851 2.2375131 1.7578579 2.8825636 0.0000000
[201] 2.1747517 2.1972246 2.8992214 2.0554050 2.3627390
[206] 1.5040774 2.8478121 2.3513753 2.2213750 2.7080502
[211] 3.1135153 1.5151272 2.1972246 2.5900171 2.7080502
[216] 2.0149030 1.4469190 2.5257286 1.6351057 1.2089603
[221] 2.4078456 1.3454724 1.8562980 1.7155981 2.3025851
[226] 1.7316555 2.4423470 1.2527630 1.2089603 1.5581446
[231] 2.9947318 1.2527630 1.3862944 1.9459101 1.8325815
[236] 1.5040774 2.6595600 1.6094379 2.6210388 2.6181255
[241] 2.0149030 1.3350011 1.6094379 2.2428351 1.7047481
[246] 1.3217558 1.2527630 1.7578579 2.4849066 1.6094379
[251] 2.1690537 2.3025851 2.1400662 2.1552445 2.1972246
[256] 1.7047481 2.4078456 2.3025851 1.6486586 2.0794415
[261] 1.2697605 1.6486586 2.4570214 2.4265711 2.0149030
[266] 1.7047481 1.6094379 2.0476928 1.6582281 2.1972246
[271] 2.2669579 1.6505799 1.9459101 2.4981519 1.6582281
[276] 2.3340838 1.2089603 2.0412203 2.2159373 2.1317968
[281] 1.3862944 1.4182774 1.0986123 1.4469190 2.0188950
[286] 2.3542283 1.6094379 2.7100482 2.4203681 1.8325815
[291] 1.2527630 1.9242487 2.5257286 2.4849066 1.7917595
[296] 2.2512918 1.4109870 2.3446863 1.6094379 2.0399208
[301] 1.7047481 1.8562980 2.5257286 1.8325815 2.0794415
[306] 2.2617631 2.2082744 2.0149030 1.6094379 1.9459101
[311] 1.2669476 2.1400662 1.5040774 2.0643279 1.6582281
[316] 1.6094379 2.2332350 2.3513753 2.0149030 2.2512918
[321] 2.2617631 1.7698546 2.3997118 1.6094379 1.7263317
[326] 2.5257286 2.3804716 1.6863990 1.9459101 1.5238800
[331] 1.7917595 2.4604432 1.7263317 1.7047481 1.5789787
[336] 1.9095425 1.4469190 1.7491999 1.2527630 1.2089603
[341] 2.3627390 2.0794415 1.5581446 2.1400662 2.1804175
[346] 2.0794415 1.7917595 1.9657128 1.2237754 1.7917595
[351] 1.3217558 2.1849270 1.4701758 2.5726122 1.4701758
[356] 1.2527630 1.3350011 1.6601310 1.2089603 2.7887081
[361] 1.4469190 1.5040774 2.0794415 1.3862944 2.0744290
[366] 1.3862944 1.4231083 1.7833912 1.2809338 2.1690537
[371] 1.2237754 1.4539530 1.6770966 1.6094379 2.0347056
[376] 1.9373018 2.0149030 1.2809338 0.5596158 1.2383742
[381] 2.2648832 2.1388890 2.1961128 1.2947272 1.2527630
[386] 1.2325603 1.7047481 1.9358598 1.2556160 1.3217558
[391] 1.4279160 2.2586332 2.6858046 2.5257286 1.7047481
[396] 1.6389967 2.0794415 1.7630170 1.2089603 1.9459101
[401] 2.3025851 2.0794415 1.9286187 1.7137979 2.0149030
[406] 2.1894164 2.1972246 1.2527630 1.7526721 3.2188758
[411] 1.9242487 1.8718022 1.3217558 1.2527630 1.5040774
[416] 0.6981347 1.4279160 2.5649494 1.3812818 2.0149030
[421] 2.5741378 1.3862944 1.3737156 2.5649494 2.1972246
[426] 1.5151272 2.2512918 1.5040774 2.1690537 2.3025851
[431] 2.8903718 3.2180755 2.4890647 3.0910425 2.1690537
[436] 3.1000923 2.8478121 1.7917595 2.0869136 2.2235419
[441] 2.4849066 2.3617970 1.7422190 2.3025851 2.8622009
[446] 2.7080502 2.0515563 2.0541237 2.3025851 3.2180755
[451] 2.3302003 2.7080502 2.4849066 2.3589654 1.7664417
[456] 2.4176979 2.1471002 2.6311692 1.7422190 2.7593768
[461] 2.0149030 2.4203681 1.8164521 2.5989791 1.8325815
[466] 1.8718022 2.4849066 2.1400662 2.0794415 1.7491999
[471] 2.7555697 2.2884862 2.6034302 1.6863990 1.8325815
[476] 1.7047481 1.6094379 1.8325815 1.7491999 3.0204249
[481] 1.6094379 1.9459101 2.8903718 2.4849066 3.0155349
[486] 3.1000923 2.7985001 2.1552445 2.9642416 2.6390573
[491] 2.3025851 2.7694588 2.9957323 2.3025851 3.2180755
[496] 2.4203681 3.1280755 2.3223877 2.3025851 2.6390573
[501] 2.5257286 1.7561323 3.2180755 1.4701758 2.4203681
[506] 1.8976199 2.0794415 2.8992214 2.4849066 2.1849270
[511] 2.2512918 2.6137395 2.4849066 2.7080502 2.5392370
[516] 1.9987736 2.7447035 2.0082140 1.8325815 1.8325815
[521] 2.2375131 3.1135153 2.0149030 1.9459101 1.7491999
[526] 2.0373166 2.5257286 2.7725887 2.4672517 2.4300984
[531] 1.8082888 3.1463051 2.9897142 2.7330680
> lm1 = lm(log(cps$wage) ~., data = cps)
> summary(lm1)
Call:
lm(formula = log(cps$wage) ~ ., data = cps)
Residuals:
Min 1Q Median 3Q Max
-2.16246 -0.29163 -0.00469 0.29981 1.98248
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.078596 0.687514 1.569 0.117291
education 0.179366 0.110756 1.619 0.105949
south -0.102360 0.042823 -2.390 0.017187 *
sex -0.221997 0.039907 -5.563 4.24e-08 ***
experience 0.095822 0.110799 0.865 0.387531
union 0.200483 0.052475 3.821 0.000149 ***
age -0.085444 0.110730 -0.772 0.440671
race 0.050406 0.028531 1.767 0.077865 .
occupation -0.007417 0.013109 -0.566 0.571761
sector 0.091458 0.038736 2.361 0.018589 *
marr 0.076611 0.041931 1.827 0.068259 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4398 on 523 degrees of freedom
Multiple R-squared: 0.3185, Adjusted R-squared: 0.3054
F-statistic: 24.44 on 10 and 523 DF, p-value: < 2.2e-16
> plot(lm1)
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Warning messages:
1: not plotting observations with leverage one:
444
2: not plotting observations with leverage one:
444
>
>
>
> library(corrplot)
corrplot 0.84 loaded
>
> cps.cor = cor(cps)
> corrplot.mixed(cps.cor, lower.col = “black”, number.cex = .7)
Error: unexpected input in "corrplot.mixed(cps.cor, lower.col = ?
> corrplot.mixed(cps.cor, lower.col = "black", number.cex = .7)
> install.packages("mctest")
Installing package into ‘C:/Users/Hyo/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/mctest_1.2.zip'
Content type 'application/zip' length 68474 bytes (66 KB)
downloaded 66 KB
package ‘mctest’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Hyo\AppData\Local\Temp\RtmpofABMJ\downloaded_packages
> library(mctest)
> omcdiag(cps[,c(1:5,7:11)], cps$wage)
Call:
omcdiag(x = cps[, c(1:5, 7:11)], y = cps$wage)
Overall Multicollinearity Diagnostics
MC Results detection
Determinant |X'X|: 0.0001 1
Farrar Chi-Square: 4833.5751 1
Red Indicator: 0.1983 0
Sum of Lambda Inverse: 10068.8439 1
Theil's Method: 1.2263 1
Condition Number: 739.7337 1
1 --> COLLINEARITY is detected by the test
0 --> COLLINEARITY is not detected by the test
> head(str)
1 function (object, ...)
2 UseMethod("str")
> head(cps)
education south sex experience union wage age race occupation sector marr
1 8 0 1 21 0 5.10 35 2 6 1 1
2 9 0 1 42 0 4.95 57 3 6 1 1
3 12 0 0 1 0 6.67 19 3 6 1 0
4 12 0 0 4 0 4.00 22 3 6 0 0
5 12 0 0 17 0 7.50 35 3 6 0 1
6 13 0 0 9 1 13.07 28 3 6 0 0
> omcdiag(cps[,c(-6)], cps$wage)
Call:
omcdiag(x = cps[, c(-6)], y = cps$wage)
Overall Multicollinearity Diagnostics
MC Results detection
Determinant |X'X|: 0.0001 1
Farrar Chi-Square: 4833.5751 1
Red Indicator: 0.1983 0
Sum of Lambda Inverse: 10068.8439 1
Theil's Method: 1.2263 1
Condition Number: 739.7337 1
1 --> COLLINEARITY is detected by the test
0 --> COLLINEARITY is not detected by the test
> imcdiag(cps[,c(-6)],cps$wage)
Call:
imcdiag(x = cps[, c(-6)], y = cps$wage)
All Individual Multicollinearity Diagnostics Result
VIF TOL Wi Fi Leamer CVIF Klein
education 231.1956 0.0043 13402.4982 15106.5849 0.0658 236.4725 1
south 1.0468 0.9553 2.7264 3.0731 0.9774 1.0707 0
sex 1.0916 0.9161 5.3351 6.0135 0.9571 1.1165 0
experience 5184.0939 0.0002 301771.2445 340140.5368 0.0139 5302.4188 1
union 1.1209 0.8922 7.0368 7.9315 0.9445 1.1464 0
age 4645.6650 0.0002 270422.7164 304806.1391 0.0147 4751.7005 1
race 1.0371 0.9642 2.1622 2.4372 0.9819 1.0608 0
occupation 1.2982 0.7703 17.3637 19.5715 0.8777 1.3279 0
sector 1.1987 0.8343 11.5670 13.0378 0.9134 1.2260 0
marr 1.0961 0.9123 5.5969 6.3085 0.9551 1.1211 0
1 --> COLLINEARITY is detected by the test
0 --> COLLINEARITY is not detected by the test
education , south , experience , age , race , occupation , sector , marr , coefficient(s) are non-significant may be due to multicollinearity
R-square of y on all x: 0.2805
* use method argument to check which regressors may be the reason of collinearity
===================================
> pcor(cps[,c(-6)],method = “pearson”)$estimate
Error: unexpected input in "pcor(cps[,c(-6)],method = ?
> pcor(cps[,c(-6)], method = "pearson")$estimate
education south sex experience union age race occupation
education 1.000000000 -0.031750193 0.051510483 -0.99756187 -0.007479144 0.99726160 0.017230877 0.029436911
south -0.031750193 1.000000000 -0.030152499 -0.02231360 -0.097548621 0.02152507 -0.111197596 0.008430595
sex 0.051510483 -0.030152499 1.000000000 0.05497703 -0.120087577 -0.05369785 0.020017315 -0.142750864
experience -0.997561873 -0.022313605 0.054977034 1.00000000 -0.010244447 0.99987574 0.010888486 0.042058560
union -0.007479144 -0.097548621 -0.120087577 -0.01024445 1.000000000 0.01223890 -0.107706183 0.212996388
age 0.997261601 0.021525073 -0.053697851 0.99987574 0.012238897 1.00000000 -0.010803310 -0.044140293
race 0.017230877 -0.111197596 0.020017315 0.01088849 -0.107706183 -0.01080331 1.000000000 0.057539374
occupation 0.029436911 0.008430595 -0.142750864 0.04205856 0.212996388 -0.04414029 0.057539374 1.000000000
sector -0.021253493 -0.021518760 -0.112146760 -0.01326166 -0.013531482 0.01456575 0.006412099 0.314746868
marr -0.040302967 0.030418218 0.004163264 -0.04097664 0.068918496 0.04509033 0.055645964 -0.018580965
sector marr
education -0.021253493 -0.040302967
south -0.021518760 0.030418218
sex -0.112146760 0.004163264
experience -0.013261665 -0.040976643
union -0.013531482 0.068918496
age 0.014565751 0.045090327
race 0.006412099 0.055645964
occupation 0.314746868 -0.018580965
sector 1.000000000 0.036495494
marr 0.036495494 1.000000000
> pcor(cps[,c(-6)],method = “pearson”)$estimate
Error: unexpected input in "pcor(cps[,c(-6)],method = ?
> pcor(cps[,c(-6)], method = "pearson")$estimate
education south sex experience union age race
education 1.000000000 -0.031750193 0.051510483 -0.99756187 -0.007479144 0.99726160 0.017230877
south -0.031750193 1.000000000 -0.030152499 -0.02231360 -0.097548621 0.02152507 -0.111197596
sex 0.051510483 -0.030152499 1.000000000 0.05497703 -0.120087577 -0.05369785 0.020017315
experience -0.997561873 -0.022313605 0.054977034 1.00000000 -0.010244447 0.99987574 0.010888486
union -0.007479144 -0.097548621 -0.120087577 -0.01024445 1.000000000 0.01223890 -0.107706183
age 0.997261601 0.021525073 -0.053697851 0.99987574 0.012238897 1.00000000 -0.010803310
race 0.017230877 -0.111197596 0.020017315 0.01088849 -0.107706183 -0.01080331 1.000000000
occupation 0.029436911 0.008430595 -0.142750864 0.04205856 0.212996388 -0.04414029 0.057539374
sector -0.021253493 -0.021518760 -0.112146760 -0.01326166 -0.013531482 0.01456575 0.006412099
marr -0.040302967 0.030418218 0.004163264 -0.04097664 0.068918496 0.04509033 0.055645964
occupation sector marr
education 0.029436911 -0.021253493 -0.040302967
south 0.008430595 -0.021518760 0.030418218
sex -0.142750864 -0.112146760 0.004163264
experience 0.042058560 -0.013261665 -0.040976643
union 0.212996388 -0.013531482 0.068918496
age -0.044140293 0.014565751 0.045090327
race 0.057539374 0.006412099 0.055645964
occupation 1.000000000 0.314746868 -0.018580965
sector 0.314746868 1.000000000 0.036495494
marr -0.018580965 0.036495494 1.000000000
> pcor(cps[,c(-6)], method = "pearson")$estimate
education south sex experience union age race
education 1.000000000 -0.031750193 0.051510483 -0.99756187 -0.007479144 0.99726160 0.017230877
south -0.031750193 1.000000000 -0.030152499 -0.02231360 -0.097548621 0.02152507 -0.111197596
sex 0.051510483 -0.030152499 1.000000000 0.05497703 -0.120087577 -0.05369785 0.020017315
experience -0.997561873 -0.022313605 0.054977034 1.00000000 -0.010244447 0.99987574 0.010888486
union -0.007479144 -0.097548621 -0.120087577 -0.01024445 1.000000000 0.01223890 -0.107706183
age 0.997261601 0.021525073 -0.053697851 0.99987574 0.012238897 1.00000000 -0.010803310
race 0.017230877 -0.111197596 0.020017315 0.01088849 -0.107706183 -0.01080331 1.000000000
occupation 0.029436911 0.008430595 -0.142750864 0.04205856 0.212996388 -0.04414029 0.057539374
sector -0.021253493 -0.021518760 -0.112146760 -0.01326166 -0.013531482 0.01456575 0.006412099
marr -0.040302967 0.030418218 0.004163264 -0.04097664 0.068918496 0.04509033 0.055645964
occupation sector marr
education 0.029436911 -0.021253493 -0.040302967
south 0.008430595 -0.021518760 0.030418218
sex -0.142750864 -0.112146760 0.004163264
experience 0.042058560 -0.013261665 -0.040976643
union 0.212996388 -0.013531482 0.068918496
age -0.044140293 0.014565751 0.045090327
race 0.057539374 0.006412099 0.055645964
occupation 1.000000000 0.314746868 -0.018580965
sector 0.314746868 1.000000000 0.036495494
marr -0.018580965 0.036495494 1.000000000
> round(pcor(cps[,c(-6)], method = "pearson")$estimate,4)
education south sex experience union age race occupation sector marr
education 1.0000 -0.0318 0.0515 -0.9976 -0.0075 0.9973 0.0172 0.0294 -0.0213 -0.0403
south -0.0318 1.0000 -0.0302 -0.0223 -0.0975 0.0215 -0.1112 0.0084 -0.0215 0.0304
sex 0.0515 -0.0302 1.0000 0.0550 -0.1201 -0.0537 0.0200 -0.1428 -0.1121 0.0042
experience -0.9976 -0.0223 0.0550 1.0000 -0.0102 0.9999 0.0109 0.0421 -0.0133 -0.0410
union -0.0075 -0.0975 -0.1201 -0.0102 1.0000 0.0122 -0.1077 0.2130 -0.0135 0.0689
age 0.9973 0.0215 -0.0537 0.9999 0.0122 1.0000 -0.0108 -0.0441 0.0146 0.0451
race 0.0172 -0.1112 0.0200 0.0109 -0.1077 -0.0108 1.0000 0.0575 0.0064 0.0556
occupation 0.0294 0.0084 -0.1428 0.0421 0.2130 -0.0441 0.0575 1.0000 0.3147 -0.0186
sector -0.0213 -0.0215 -0.1121 -0.0133 -0.0135 0.0146 0.0064 0.3147 1.0000 0.0365
marr -0.0403 0.0304 0.0042 -0.0410 0.0689 0.0451 0.0556 -0.0186 0.0365 1.0000
> lm2 = lm(log(cps$wage) ~ . -age , data = cps)
> summary(lm2)
Call:
lm(formula = log(cps$wage) ~ . - age, data = cps)
Residuals:
Min 1Q Median 3Q Max
-2.16044 -0.29073 -0.00505 0.29994 1.97997
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.562676 0.160116 3.514 0.000479 ***
education 0.094135 0.008188 11.497 < 2e-16 ***
south -0.103071 0.042796 -2.408 0.016367 *
sex -0.220344 0.039834 -5.532 5.02e-08 ***
experience 0.010335 0.001746 5.919 5.86e-09 ***
union 0.199987 0.052450 3.813 0.000154 ***
race 0.050643 0.028519 1.776 0.076345 .
occupation -0.006971 0.013091 -0.532 0.594619
sector 0.091022 0.038717 2.351 0.019094 *
marr 0.075152 0.041872 1.795 0.073263 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4397 on 524 degrees of freedom
Multiple R-squared: 0.3177, Adjusted R-squared: 0.306
F-statistic: 27.11 on 9 and 524 DF, p-value: < 2.2e-16
> anova(lm1, lm2)
Analysis of Variance Table
Model 1: log(cps$wage) ~ education + south + sex + experience + union +
age + race + occupation + sector + marr
Model 2: log(cps$wage) ~ (education + south + sex + experience + union +
age + race + occupation + sector + marr) - age
Res.Df RSS Df Sum of Sq F Pr(>F)
1 523 101.17
2 524 101.28 -1 -0.11518 0.5954 0.4407
> anova(lm2, lm1)
Analysis of Variance Table
Model 1: log(cps$wage) ~ (education + south + sex + experience + union +
age + race + occupation + sector + marr) - age
Model 2: log(cps$wage) ~ education + south + sex + experience + union +
age + race + occupation + sector + marr
Res.Df RSS Df Sum of Sq F Pr(>F)
1 524 101.28
2 523 101.17 1 0.11518 0.5954 0.4407
> summary(lm1)
Call:
lm(formula = log(cps$wage) ~ ., data = cps)
Residuals:
Min 1Q Median 3Q Max
-2.16246 -0.29163 -0.00469 0.29981 1.98248
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.078596 0.687514 1.569 0.117291
education 0.179366 0.110756 1.619 0.105949
south -0.102360 0.042823 -2.390 0.017187 *
sex -0.221997 0.039907 -5.563 4.24e-08 ***
experience 0.095822 0.110799 0.865 0.387531
union 0.200483 0.052475 3.821 0.000149 ***
age -0.085444 0.110730 -0.772 0.440671
race 0.050406 0.028531 1.767 0.077865 .
occupation -0.007417 0.013109 -0.566 0.571761
sector 0.091458 0.038736 2.361 0.018589 *
marr 0.076611 0.041931 1.827 0.068259 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4398 on 523 degrees of freedom
Multiple R-squared: 0.3185, Adjusted R-squared: 0.3054
F-statistic: 24.44 on 10 and 523 DF, p-value: < 2.2e-16
> corrplot.mixed(cps.cor, lower.col = "black", number.cex = 1)
> lm3 = lm(log(cps$wage) ~ . -age -education , data = cps)
> summary(lm3)
Call:
lm(formula = log(cps$wage) ~ . - age - education, data = cps)
Residuals:
Min 1Q Median 3Q Max
-2.35385 -0.34226 -0.02236 0.30725 1.84988
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.974436 0.114881 17.187 < 2e-16 ***
south -0.171508 0.047380 -3.620 0.000323 ***
sex -0.232997 0.044517 -5.234 2.40e-07 ***
experience 0.003031 0.001818 1.667 0.096145 .
union 0.238528 0.058518 4.076 5.29e-05 ***
race 0.079273 0.031761 2.496 0.012870 *
occupation -0.036678 0.014348 -2.556 0.010858 *
sector 0.050525 0.043105 1.172 0.241675
marr 0.105542 0.046719 2.259 0.024286 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4915 on 525 degrees of freedom
Multiple R-squared: 0.1456, Adjusted R-squared: 0.1326
F-statistic: 11.18 on 8 and 525 DF, p-value: 1.177e-14
> lm3 = lm(log(cps$wage) ~ . -age -experience , data = cps)
> summary(lm3)
Call:
lm(formula = log(cps$wage) ~ . - age - experience, data = cps)
Residuals:
Min 1Q Median 3Q Max
-2.1519 -0.3309 0.0034 0.3028 1.8315
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.95654 0.15028 6.365 4.26e-10 ***
education 0.07650 0.00787 9.721 < 2e-16 ***
south -0.11579 0.04411 -2.625 0.008912 **
sex -0.20108 0.04097 -4.908 1.23e-06 ***
union 0.23924 0.05369 4.456 1.02e-05 ***
race 0.05157 0.02943 1.752 0.080287 .
occupation -0.01719 0.01339 -1.283 0.199910
sector 0.10996 0.03982 2.762 0.005953 **
marr 0.13980 0.04171 3.352 0.000861 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4537 on 525 degrees of freedom
Multiple R-squared: 0.2721, Adjusted R-squared: 0.261
F-statistic: 24.53 on 8 and 525 DF, p-value: < 2.2e-16
> factor()
factor(0)
Levels:
> factor(cps$sex, levels= c("male", "female")
+ )
[1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[21] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[41] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[61] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[81] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[101] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[121] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[141] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[161] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[181] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[201] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[221] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[241] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[261] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[281] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[301] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[321] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[341] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[361] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[381] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[401] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[421] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[441] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[461] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[481] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[501] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
[521] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
Levels: male female
> cps <- read.csv("http://commres.net/wiki/_media/cps_85_wages.csv", header = T, sep = "\t")
> str(cps)
'data.frame': 534 obs. of 11 variables:
$ education : int 8 9 12 12 12 13 10 12 16 12 ...
$ south : int 0 0 0 0 0 0 1 0 0 0 ...
$ sex : int 1 1 0 0 0 0 0 0 0 0 ...
$ experience: int 21 42 1 4 17 9 27 9 11 9 ...
$ union : int 0 0 0 0 0 1 0 0 0 0 ...
$ wage : num 5.1 4.95 6.67 4 7.5 ...
$ age : int 35 57 19 22 35 28 43 27 33 27 ...
$ race : int 2 3 3 3 3 3 3 3 3 3 ...
$ occupation: int 6 6 6 6 6 6 6 6 6 6 ...
$ sector : int 1 1 1 0 0 0 0 0 1 0 ...
$ marr : int 1 1 0 0 1 0 0 0 1 0 ...
> head(cps)
education south sex experience union wage age race occupation sector marr
1 8 0 1 21 0 5.10 35 2 6 1 1
2 9 0 1 42 0 4.95 57 3 6 1 1
3 12 0 0 1 0 6.67 19 3 6 1 0
4 12 0 0 4 0 4.00 22 3 6 0 0
5 12 0 0 17 0 7.50 35 3 6 0 1
6 13 0 0 9 1 13.07 28 3 6 0 0
> factor(cps$sex)
[1] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
[53] 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0
[105] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0
[157] 0 0 0 1 1 0 1 0 1 0 0 0 0 1 1 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 1 0 0 0 0 0 0 1 1 1 0 1
[209] 1 0 0 1 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1 1 1 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 1 1 1
[261] 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 1 1 0 0 1 1 1 0 1 1 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0
[313] 1 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 0 0 1 1 0 1 1 1 1 0 1 1 0 0 1 0 1 1
[365] 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 0 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 1 0 0 0
[417] 1 0 1 1 1 0 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 1 0 1 1 1 1
[469] 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 0 0 0 1 0 1 1 1
[521] 0 0 0 0 1 1 0 0 1 0 1 1 0 0
Levels: 0 1
> cps <- factor(cps$sex)
> str(cps)
Factor w/ 2 levels "0","1": 2 2 1 1 1 1 1 1 1 1 ...
> cps <- read.csv("http://commres.net/wiki/_media/cps_85_wages.csv", header = T, sep = "\t")
> factor(cps$sex)
[1] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
[53] 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0
[105] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0
[157] 0 0 0 1 1 0 1 0 1 0 0 0 0 1 1 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 1 0 0 0 0 0 0 1 1 1 0 1
[209] 1 0 0 1 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1 1 1 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 1 1 1
[261] 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 1 1 0 0 1 1 1 0 1 1 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0
[313] 1 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 0 0 1 1 0 1 1 1 1 0 1 1 0 0 1 0 1 1
[365] 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 0 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 1 0 0 0
[417] 1 0 1 1 1 0 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 1 0 1 1 1 1
[469] 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 0 0 0 1 0 1 1 1
[521] 0 0 0 0 1 1 0 0 1 0 1 1 0 0
Levels: 0 1
> str(cps)
'data.frame': 534 obs. of 11 variables:
$ education : int 8 9 12 12 12 13 10 12 16 12 ...
$ south : int 0 0 0 0 0 0 1 0 0 0 ...
$ sex : int 1 1 0 0 0 0 0 0 0 0 ...
$ experience: int 21 42 1 4 17 9 27 9 11 9 ...
$ union : int 0 0 0 0 0 1 0 0 0 0 ...
$ wage : num 5.1 4.95 6.67 4 7.5 ...
$ age : int 35 57 19 22 35 28 43 27 33 27 ...
$ race : int 2 3 3 3 3 3 3 3 3 3 ...
$ occupation: int 6 6 6 6 6 6 6 6 6 6 ...
$ sector : int 1 1 1 0 0 0 0 0 1 0 ...
$ marr : int 1 1 0 0 1 0 0 0 1 0 ...
> factor(cps$sex)
[1] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
[53] 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0
[105] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0
[157] 0 0 0 1 1 0 1 0 1 0 0 0 0 1 1 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 1 0 0 0 0 0 0 1 1 1 0 1
[209] 1 0 0 1 0 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1 1 1 0 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 1 1 1
[261] 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 1 1 0 0 1 1 1 0 1 1 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0
[313] 1 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 0 0 1 1 0 1 1 1 1 0 1 1 0 0 1 0 1 1
[365] 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 0 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 1 0 0 0
[417] 1 0 1 1 1 0 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 1 0 1 1 1 1
[469] 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 0 0 0 1 0 1 1 1
[521] 0 0 0 0 1 1 0 0 1 0 1 1 0 0
Levels: 0 1
> cps$sex <- factor(cps$sex)
> cps$union <- factor(cps$union)
> cps$race <- factor(cps$race)
> cps$sector <- factor(cps$sector)
> cps$occupation <- factor(cps$occupation)
> cps$marr <- factor(cps$marr)
> str(cps)
> lm1 = lm(log(cps$wage) ~., data = cps)
> summary(lm1)
Call:
lm(formula = log(cps$wage) ~ ., data = cps)
Residuals:
Min 1Q Median 3Q Max
-2.16246 -0.29163 -0.00469 0.29981 1.98248
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.078596 0.687514 1.569 0.117291
education 0.179366 0.110756 1.619 0.105949
south -0.102360 0.042823 -2.390 0.017187 *
sex -0.221997 0.039907 -5.563 4.24e-08 ***
experience 0.095822 0.110799 0.865 0.387531
union 0.200483 0.052475 3.821 0.000149 ***
age -0.085444 0.110730 -0.772 0.440671
race 0.050406 0.028531 1.767 0.077865 .
occupation -0.007417 0.013109 -0.566 0.571761
sector 0.091458 0.038736 2.361 0.018589 *
marr 0.076611 0.041931 1.827 0.068259 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4398 on 523 degrees of freedom
Multiple R-squared: 0.3185, Adjusted R-squared: 0.3054
F-statistic: 24.44 on 10 and 523 DF, p-value: < 2.2e-16
plot(lm1)
> library(corrplot) > cps.cor = cor(cps) > corrplot.mixed(cps.cor, lower.col = "black")
> install.packages("mctest")
> library(mctest)
> omcdiag(cps[,c(-6)], cps$wage) # or "omcdiag(cps[,c(1:5,7:11)], cps$wage)" will work as well.
Call:
omcdiag(x = cps[, c(-6)], y = cps$wage)
Overall Multicollinearity Diagnostics
MC Results detection
Determinant |X'X|: 0.0001 1
Farrar Chi-Square: 4833.5751 1
Red Indicator: 0.1983 0
Sum of Lambda Inverse: 10068.8439 1
Theil's Method: 1.2263 1
Condition Number: 739.7337 1
1 --> COLLINEARITY is detected by the test
0 --> COLLINEARITY is not detected by the test
>
> imcdiag(cps[,c(-6)],cps$wage)
Call:
imcdiag(x = cps[, c(-6)], y = cps$wage)
All Individual Multicollinearity Diagnostics Result
VIF TOL Wi Fi Leamer CVIF Klein
education 231.1956 0.0043 13402.4982 15106.5849 0.0658 236.4725 1
south 1.0468 0.9553 2.7264 3.0731 0.9774 1.0707 0
sex 1.0916 0.9161 5.3351 6.0135 0.9571 1.1165 0
experience 5184.0939 0.0002 301771.2445 340140.5368 0.0139 5302.4188 1
union 1.1209 0.8922 7.0368 7.9315 0.9445 1.1464 0
age 4645.6650 0.0002 270422.7164 304806.1391 0.0147 4751.7005 1
race 1.0371 0.9642 2.1622 2.4372 0.9819 1.0608 0
occupation 1.2982 0.7703 17.3637 19.5715 0.8777 1.3279 0
sector 1.1987 0.8343 11.5670 13.0378 0.9134 1.2260 0
marr 1.0961 0.9123 5.5969 6.3085 0.9551 1.1211 0
1 --> COLLINEARITY is detected by the test
0 --> COLLINEARITY is not detected by the test
education , south , experience , age , race , occupation , sector , marr , coefficient(s) are non-significant may be due to multicollinearity
R-square of y on all x: 0.2805
* use method argument to check which regressors may be the reason of collinearity
===================================
>
> round(pcor(cps[,c(-6)], method = "pearson")$estimate,4)
education south sex experience union age race occupation sector marr
education 1.0000 -0.0318 0.0515 -0.9976 -0.0075 0.9973 0.0172 0.0294 -0.0213 -0.0403
south -0.0318 1.0000 -0.0302 -0.0223 -0.0975 0.0215 -0.1112 0.0084 -0.0215 0.0304
sex 0.0515 -0.0302 1.0000 0.0550 -0.1201 -0.0537 0.0200 -0.1428 -0.1121 0.0042
experience -0.9976 -0.0223 0.0550 1.0000 -0.0102 0.9999 0.0109 0.0421 -0.0133 -0.0410
union -0.0075 -0.0975 -0.1201 -0.0102 1.0000 0.0122 -0.1077 0.2130 -0.0135 0.0689
age 0.9973 0.0215 -0.0537 0.9999 0.0122 1.0000 -0.0108 -0.0441 0.0146 0.0451
race 0.0172 -0.1112 0.0200 0.0109 -0.1077 -0.0108 1.0000 0.0575 0.0064 0.0556
occupation 0.0294 0.0084 -0.1428 0.0421 0.2130 -0.0441 0.0575 1.0000 0.3147 -0.0186
sector -0.0213 -0.0215 -0.1121 -0.0133 -0.0135 0.0146 0.0064 0.3147 1.0000 0.0365
marr -0.0403 0.0304 0.0042 -0.0410 0.0689 0.0451 0.0556 -0.0186 0.0365 1.0000
> lm2 = lm(log(cps$wage) ~ . -age , data = cps)
> summary(lm2)
Call:
lm(formula = log(cps$wage) ~ . - age, data = cps)
Residuals:
Min 1Q Median 3Q Max
-2.16044 -0.29073 -0.00505 0.29994 1.97997
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.562676 0.160116 3.514 0.000479 ***
education 0.094135 0.008188 11.497 < 2e-16 ***
south -0.103071 0.042796 -2.408 0.016367 *
sex -0.220344 0.039834 -5.532 5.02e-08 ***
experience 0.010335 0.001746 5.919 5.86e-09 ***
union 0.199987 0.052450 3.813 0.000154 ***
race 0.050643 0.028519 1.776 0.076345 .
occupation -0.006971 0.013091 -0.532 0.594619
sector 0.091022 0.038717 2.351 0.019094 *
marr 0.075152 0.041872 1.795 0.073263 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4397 on 524 degrees of freedom
Multiple R-squared: 0.3177, Adjusted R-squared: 0.306
F-statistic: 27.11 on 9 and 524 DF, p-value: < 2.2e-16
> summary(lm1)
Call:
lm(formula = log(cps$wage) ~ ., data = cps)
Residuals:
Min 1Q Median 3Q Max
-2.16246 -0.29163 -0.00469 0.29981 1.98248
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.078596 0.687514 1.569 0.117291
education 0.179366 0.110756 1.619 0.105949
south -0.102360 0.042823 -2.390 0.017187 *
sex -0.221997 0.039907 -5.563 4.24e-08 ***
experience 0.095822 0.110799 0.865 0.387531
union 0.200483 0.052475 3.821 0.000149 ***
age -0.085444 0.110730 -0.772 0.440671
race 0.050406 0.028531 1.767 0.077865 .
occupation -0.007417 0.013109 -0.566 0.571761
sector 0.091458 0.038736 2.361 0.018589 *
marr 0.076611 0.041931 1.827 0.068259 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4398 on 523 degrees of freedom
Multiple R-squared: 0.3185, Adjusted R-squared: 0.3054
F-statistic: 24.44 on 10 and 523 DF, p-value: < 2.2e-16
>
>
multicolinearity.1545759460.txt.gz · Last modified: by hkimscil





