User Tools

Site Tools


multicolinearity

This is an old revision of the document!


Multi-colinearity check in r

required library:

  • corrplot
  • mctest
    • omcdiag
    • imcdiag
> cps <- read.csv("http://commres.net/wiki/_media/cps_85_wages.csv", header = T, sep = "\t")
> str(cps)
'data.frame':	534 obs. of  11 variables:
 $ education : int  8 9 12 12 12 13 10 12 16 12 ...
 $ south     : int  0 0 0 0 0 0 1 0 0 0 ...
 $ sex       : int  1 1 0 0 0 0 0 0 0 0 ...
 $ experience: int  21 42 1 4 17 9 27 9 11 9 ...
 $ union     : int  0 0 0 0 0 1 0 0 0 0 ...
 $ wage      : num  5.1 4.95 6.67 4 7.5 ...
 $ age       : int  35 57 19 22 35 28 43 27 33 27 ...
 $ race      : int  2 3 3 3 3 3 3 3 3 3 ...
 $ occupation: int  6 6 6 6 6 6 6 6 6 6 ...
 $ sector    : int  1 1 1 0 0 0 0 0 1 0 ...
 $ marr      : int  1 1 0 0 1 0 0 0 1 0 ...
> head(cps)
> head(cps)
  education south sex experience union  wage age race occupation sector marr
1         8     0   1         21     0  5.10  35    2          6      1    1
2         9     0   1         42     0  4.95  57    3          6      1    1
3        12     0   0          1     0  6.67  19    3          6      1    0
4        12     0   0          4     0  4.00  22    3          6      0    0
5        12     0   0         17     0  7.50  35    3          6      0    1
6        13     0   0          9     1 13.07  28    3          6      0    0
> lm1 = lm(log(cps$wage) ~., data = cps)
> summary(lm1)

Call:
lm(formula = log(cps$wage) ~ ., data = cps)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.16246 -0.29163 -0.00469  0.29981  1.98248 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.078596   0.687514   1.569 0.117291    
education    0.179366   0.110756   1.619 0.105949    
south       -0.102360   0.042823  -2.390 0.017187 *  
sex         -0.221997   0.039907  -5.563 4.24e-08 ***
experience   0.095822   0.110799   0.865 0.387531    
union        0.200483   0.052475   3.821 0.000149 ***
age         -0.085444   0.110730  -0.772 0.440671    
race         0.050406   0.028531   1.767 0.077865 .  
occupation  -0.007417   0.013109  -0.566 0.571761    
sector       0.091458   0.038736   2.361 0.018589 *  
marr         0.076611   0.041931   1.827 0.068259 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4398 on 523 degrees of freedom
Multiple R-squared:  0.3185,	Adjusted R-squared:  0.3054 
F-statistic: 24.44 on 10 and 523 DF,  p-value: < 2.2e-16
plot(lm1)


> library(corrplot)
> cps.cor = cor(cps)
> corrplot.mixed(cps.cor, lower.col = "black")

> install.packages("mctest")
> library(mctest)
> omcdiag(cps[,c(-6)], cps$wage) # or "omcdiag(cps[,c(1:5,7:11)], cps$wage)" will work as well.

Call:
omcdiag(x = cps[, c(-6)], y = cps$wage)


Overall Multicollinearity Diagnostics

                       MC Results detection
Determinant |X'X|:         0.0001         1
Farrar Chi-Square:      4833.5751         1
Red Indicator:             0.1983         0
Sum of Lambda Inverse: 10068.8439         1
Theil's Method:            1.2263         1
Condition Number:        739.7337         1

1 --> COLLINEARITY is detected by the test 
0 --> COLLINEARITY is not detected by the test

> 
> imcdiag(cps[,c(-6)],cps$wage) 

Call:
imcdiag(x = cps[, c(-6)], y = cps$wage)


All Individual Multicollinearity Diagnostics Result

                 VIF    TOL          Wi          Fi Leamer      CVIF Klein
education   231.1956 0.0043  13402.4982  15106.5849 0.0658  236.4725     1
south         1.0468 0.9553      2.7264      3.0731 0.9774    1.0707     0
sex           1.0916 0.9161      5.3351      6.0135 0.9571    1.1165     0
experience 5184.0939 0.0002 301771.2445 340140.5368 0.0139 5302.4188     1
union         1.1209 0.8922      7.0368      7.9315 0.9445    1.1464     0
age        4645.6650 0.0002 270422.7164 304806.1391 0.0147 4751.7005     1
race          1.0371 0.9642      2.1622      2.4372 0.9819    1.0608     0
occupation    1.2982 0.7703     17.3637     19.5715 0.8777    1.3279     0
sector        1.1987 0.8343     11.5670     13.0378 0.9134    1.2260     0
marr          1.0961 0.9123      5.5969      6.3085 0.9551    1.1211     0

1 --> COLLINEARITY is detected by the test 
0 --> COLLINEARITY is not detected by the test

education , south , experience , age , race , occupation , sector , marr , coefficient(s) are non-significant may be due to multicollinearity

R-square of y on all x: 0.2805 

* use method argument to check which regressors may be the reason of collinearity
===================================
> 
> round(pcor(cps[,c(-6)], method = "pearson")$estimate,4) 
           education   south     sex experience   union     age    race occupation  sector    marr
education     1.0000 -0.0318  0.0515    -0.9976 -0.0075  0.9973  0.0172     0.0294 -0.0213 -0.0403
south        -0.0318  1.0000 -0.0302    -0.0223 -0.0975  0.0215 -0.1112     0.0084 -0.0215  0.0304
sex           0.0515 -0.0302  1.0000     0.0550 -0.1201 -0.0537  0.0200    -0.1428 -0.1121  0.0042
experience   -0.9976 -0.0223  0.0550     1.0000 -0.0102  0.9999  0.0109     0.0421 -0.0133 -0.0410
union        -0.0075 -0.0975 -0.1201    -0.0102  1.0000  0.0122 -0.1077     0.2130 -0.0135  0.0689
age           0.9973  0.0215 -0.0537     0.9999  0.0122  1.0000 -0.0108    -0.0441  0.0146  0.0451
race          0.0172 -0.1112  0.0200     0.0109 -0.1077 -0.0108  1.0000     0.0575  0.0064  0.0556
occupation    0.0294  0.0084 -0.1428     0.0421  0.2130 -0.0441  0.0575     1.0000  0.3147 -0.0186
sector       -0.0213 -0.0215 -0.1121    -0.0133 -0.0135  0.0146  0.0064     0.3147  1.0000  0.0365
marr         -0.0403  0.0304  0.0042    -0.0410  0.0689  0.0451  0.0556    -0.0186  0.0365  1.0000
> lm2 = lm(log(cps$wage) ~ . -age , data = cps)
> summary(lm2)

Call:
lm(formula = log(cps$wage) ~ . - age, data = cps)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.16044 -0.29073 -0.00505  0.29994  1.97997 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.562676   0.160116   3.514 0.000479 ***
education    0.094135   0.008188  11.497  < 2e-16 ***
south       -0.103071   0.042796  -2.408 0.016367 *  
sex         -0.220344   0.039834  -5.532 5.02e-08 ***
experience   0.010335   0.001746   5.919 5.86e-09 ***
union        0.199987   0.052450   3.813 0.000154 ***
race         0.050643   0.028519   1.776 0.076345 .  
occupation  -0.006971   0.013091  -0.532 0.594619    
sector       0.091022   0.038717   2.351 0.019094 *  
marr         0.075152   0.041872   1.795 0.073263 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4397 on 524 degrees of freedom
Multiple R-squared:  0.3177,	Adjusted R-squared:  0.306 
F-statistic: 27.11 on 9 and 524 DF,  p-value: < 2.2e-16

> summary(lm1)

Call:
lm(formula = log(cps$wage) ~ ., data = cps)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.16246 -0.29163 -0.00469  0.29981  1.98248 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.078596   0.687514   1.569 0.117291    
education    0.179366   0.110756   1.619 0.105949    
south       -0.102360   0.042823  -2.390 0.017187 *  
sex         -0.221997   0.039907  -5.563 4.24e-08 ***
experience   0.095822   0.110799   0.865 0.387531    
union        0.200483   0.052475   3.821 0.000149 ***
age         -0.085444   0.110730  -0.772 0.440671    
race         0.050406   0.028531   1.767 0.077865 .  
occupation  -0.007417   0.013109  -0.566 0.571761    
sector       0.091458   0.038736   2.361 0.018589 *  
marr         0.076611   0.041931   1.827 0.068259 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4398 on 523 degrees of freedom
Multiple R-squared:  0.3185,	Adjusted R-squared:  0.3054 
F-statistic: 24.44 on 10 and 523 DF,  p-value: < 2.2e-16

> 
> 




multicolinearity.1545759489.txt.gz · Last modified: 2018/12/26 02:38 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki