User Tools

Site Tools


multicolinearity

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
multicolinearity [2018/12/26 02:41] hkimscilmulticolinearity [2018/12/26 02:49] (current) – [regression test with factors] hkimscil
Line 24: Line 24:
  $ sector    : int  1 1 1 0 0 0 0 0 1 0 ...  $ sector    : int  1 1 1 0 0 0 0 0 1 0 ...
  $ marr      : int  1 1 0 0 1 0 0 0 1 0 ...  $ marr      : int  1 1 0 0 1 0 0 0 1 0 ...
 +> head(cps)
 > head(cps) > head(cps)
   education south sex experience union  wage age race occupation sector marr   education south sex experience union  wage age race occupation sector marr
Line 34: Line 34:
 6        13                9     1 13.07  28    3          6      0    0 6        13                9     1 13.07  28    3          6      0    0
 </code> </code>
- 
-<code> 
-> cps$sex <- factor(cps$sex) 
-> cps$union <- factor(cps$union) 
-> cps$race <- factor(cps$race) 
-> cps$sector <- factor(cps$sector) 
-> cps$occupation <- factor(cps$occupation) 
-> cps$marr <- factor(cps$marr) 
-> str(cps) 
-'data.frame': 534 obs. of  11 variables: 
- $ education : int  8 9 12 12 12 13 10 12 16 12 ... 
- $ south     : int  0 0 0 0 0 0 1 0 0 0 ... 
- $ sex       : Factor w/ 2 levels "0","1": 2 2 1 1 1 1 1 1 1 1 ... 
- $ experience: int  21 42 1 4 17 9 27 9 11 9 ... 
- $ union     : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1 ... 
- $ wage      : num  5.1 4.95 6.67 4 7.5 ... 
- $ age       : int  35 57 19 22 35 28 43 27 33 27 ... 
- $ race      : Factor w/ 3 levels "1","2","3": 2 3 3 3 3 3 3 3 3 3 ... 
- $ occupation: Factor w/ 6 levels "1","2","3","4",..: 6 6 6 6 6 6 6 6 6 6 ... 
- $ sector    : Factor w/ 3 levels "0","1","2": 2 2 2 1 1 1 1 1 2 1 ... 
- $ marr      : Factor w/ 2 levels "0","1": 2 2 1 1 2 1 1 1 2 1 ... 
-> </code> 
- 
  
 <code> <code>
Line 67: Line 44:
 Residuals: Residuals:
      Min       1Q   Median       3Q      Max       Min       1Q   Median       3Q      Max 
--2.36167 -0.27926  0.00049  0.27957  1.79838 +-2.16246 -0.29163 -0.00469  0.29981  1.98248 
  
 Coefficients: Coefficients:
-            Estimate Std. Error t value Pr(>|t|)     +             Estimate Std. Error t value Pr(>|t|)     
-(Intercept)  1.54415    0.66955   2.306 0.021491 *   +(Intercept)  1.078596   0.687514   1.569 0.117291     
-education    0.12525    0.10865   1.153 0.249530     +education    0.179366   0.110756   1.619 0.105949     
-south       -0.09290    0.04197  -2.214 0.027291 *   +south       -0.102360   0.042823  -2.390 0.017187 *   
-sex1        -0.21812    0.04193  -5.202 2.85e-07 *** +sex         -0.221997   0.039907  -5.563 4.24e-08 *** 
-experience   0.06799    0.10814   0.629 0.529813     +experience   0.095822   0.110799   0.865 0.387531     
-union1       0.21178    0.05126   4.132 4.20e-05 *** +union        0.200483   0.052475   3.821 0.000149 *** 
-age         -0.05858    0.10806  -0.542 0.587963     +age         -0.085444   0.110730  -0.772 0.440671     
-race2       -0.03345    0.09912  -0.338 0.735876     +race         0.050406   0.028531   1.767 0.077865 .   
-race3        0.07973    0.05743   1.388 0.165636     +occupation  -0.007417   0.013109  -0.566 0.571761     
-occupation2 -0.36440    0.09156  -3.980 7.88e-05 *** +sector       0.091458   0.038736   2.361 0.018589 *   
-occupation3 -0.20964    0.07624  -2.750 0.006171 **  +marr         0.076611   0.041931   1.827 0.068259 .  
-occupation4 -0.38345    0.08105  -4.731 2.89e-06 *** +
-occupation5 -0.05278    0.07287  -0.724 0.469223     +
-occupation6 -0.26555    0.08002  -3.318 0.000969 *** +
-sector1      0.11532    0.05491   2.100 0.036186 *   +
-sector2      0.09296    0.09658   0.962 0.336262     +
-marr1        0.06335    0.04111   1.541 0.123899    +
 --- ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  
-Residual standard error: 0.4281 on 517 degrees of freedom +Residual standard error: 0.4398 on 523 degrees of freedom 
-Multiple R-squared:  0.3617, Adjusted R-squared:  0.342  +Multiple R-squared:  0.3185, Adjusted R-squared:  0.3054  
-F-statistic: 18.31 on 16 and 517 DF,  p-value: < 2.2e-16 +F-statistic: 24.44 on 10 and 523 DF,  p-value: < 2.2e-16
- +
-+
 </code> </code>
  
Line 248: Line 217:
 > </code> > </code>
  
 +====== regression test with factors ======
 <code> <code>
 +> cps$sex <- factor(cps$sex)
 +> cps$union <- factor(cps$union)
 +> cps$race <- factor(cps$race)
 +> cps$sector <- factor(cps$sector)
 +> cps$occupation <- factor(cps$occupation)
 +> cps$marr <- factor(cps$marr)
 +> str(cps)
 +'data.frame': 534 obs. of  11 variables:
 + $ education : int  8 9 12 12 12 13 10 12 16 12 ...
 + $ south     : int  0 0 0 0 0 0 1 0 0 0 ...
 + $ sex       : Factor w/ 2 levels "0","1": 2 2 1 1 1 1 1 1 1 1 ...
 + $ experience: int  21 42 1 4 17 9 27 9 11 9 ...
 + $ union     : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1 ...
 + $ wage      : num  5.1 4.95 6.67 4 7.5 ...
 + $ age       : int  35 57 19 22 35 28 43 27 33 27 ...
 + $ race      : Factor w/ 3 levels "1","2","3": 2 3 3 3 3 3 3 3 3 3 ...
 + $ occupation: Factor w/ 6 levels "1","2","3","4",..: 6 6 6 6 6 6 6 6 6 6 ...
 + $ sector    : Factor w/ 3 levels "0","1","2": 2 2 2 1 1 1 1 1 2 1 ...
 + $ marr      : Factor w/ 2 levels "0","1": 2 2 1 1 2 1 1 1 2 1 ...
 </code> </code>
  
 <code> <code>
 +> lm4 = lm(log(cps$wage) ~ . -age, data = cps)
 +> summary(lm4)
 +
 +Call:
 +lm(formula = log(cps$wage) ~ . - age, data = cps)
 +
 +Residuals:
 +     Min       1Q   Median       3Q      Max 
 +-2.36103 -0.28080  0.00362  0.27793  1.79594 
 +
 +Coefficients:
 +             Estimate Std. Error t value Pr(>|t|)    
 +(Intercept)  1.194821   0.181804   6.572 1.21e-10 ***
 +education    0.066603   0.010060   6.621 8.96e-11 ***
 +south       -0.093384   0.041931  -2.227  0.02637 *  
 +sex1        -0.216934   0.041844  -5.184 3.11e-07 ***
 +experience   0.009371   0.001725   5.431 8.63e-08 ***
 +union1       0.211506   0.051218   4.129 4.24e-05 ***
 +race2       -0.033928   0.099051  -0.343  0.73209    
 +race3        0.079851   0.057392   1.391  0.16472    
 +occupation2 -0.364444   0.091500  -3.983 7.78e-05 ***
 +occupation3 -0.210295   0.076175  -2.761  0.00597 ** 
 +occupation4 -0.383882   0.080990  -4.740 2.77e-06 ***
 +occupation5 -0.050664   0.072717  -0.697  0.48628    
 +occupation6 -0.265348   0.079969  -3.318  0.00097 ***
 +sector1      0.114857   0.054862   2.094  0.03678 *  
 +sector2      0.093138   0.096514   0.965  0.33499    
 +marr1        0.062211   0.041025   1.516  0.13002    
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 0.4278 on 518 degrees of freedom
 +Multiple R-squared:  0.3614, Adjusted R-squared:  0.3429 
 +F-statistic: 19.54 on 15 and 518 DF,  p-value: < 2.2e-16
 +
 +
 +
 </code> </code>
  
-<code>+<code>> lm5 = lm(log(cps$wage) ~ . -age -race, data = cps) 
 +> summary(lm5) 
 + 
 +Call: 
 +lm(formula = log(cps$wage) ~ . - age - race, data = cps) 
 + 
 +Residuals: 
 +     Min       1Q   Median       3Q      Max  
 +-2.34366 -0.28169 -0.00017  0.29179  1.81158  
 + 
 +Coefficients: 
 +             Estimate Std. Error t value Pr(>|t|)     
 +(Intercept)  1.224289   0.172070   7.115 3.73e-12 *** 
 +education    0.068838   0.009912   6.945 1.14e-11 *** 
 +south       -0.102588   0.041668  -2.462 0.014139 *   
 +sex1        -0.213602   0.041842  -5.105 4.65e-07 *** 
 +experience   0.009494   0.001723   5.510 5.65e-08 *** 
 +union1       0.202720   0.051009   3.974 8.06e-05 *** 
 +occupation2 -0.355381   0.091448  -3.886 0.000115 *** 
 +occupation3 -0.209820   0.076149  -2.755 0.006068 **  
 +occupation4 -0.385680   0.080855  -4.770 2.40e-06 *** 
 +occupation5 -0.047694   0.072746  -0.656 0.512351     
 +occupation6 -0.254277   0.079781  -3.187 0.001523 **  
 +sector1      0.111458   0.054845   2.032 0.042636 *   
 +sector2      0.099777   0.096481   1.034 0.301541     
 +marr1        0.065464   0.041036   1.595 0.111257     
 +--- 
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 + 
 +Residual standard error: 0.4283 on 520 degrees of freedom 
 +Multiple R-squared:  0.3573, Adjusted R-squared:  0.3412  
 +F-statistic: 22.24 on 13 and 520 DF,  p-value: < 2.2e-16 
 + 
 +
 </code> </code>
  
 +<code>> lm6 = lm(log(cps$wage) ~ . -age -race -occupation -marr -sector, data = cps)
 +> summary(lm6)
  
 +Call:
 +lm(formula = log(cps$wage) ~ . - age - race - occupation - marr - 
 +    sector, data = cps)
 +
 +Residuals:
 +     Min       1Q   Median       3Q      Max 
 +-2.13809 -0.28681 -0.00078  0.29376  1.96678 
 +
 +Coefficients:
 +             Estimate Std. Error t value Pr(>|t|)    
 +(Intercept)  0.731792   0.122217   5.988 3.94e-09 ***
 +education    0.094096   0.007942  11.848  < 2e-16 ***
 +south       -0.111761   0.042857  -2.608 0.009372 ** 
 +sex1        -0.231978   0.039202  -5.918 5.88e-09 ***
 +experience   0.011548   0.001680   6.875 1.75e-11 ***
 +union1       0.198360   0.051243   3.871 0.000122 ***
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 0.4433 on 528 degrees of freedom
 +Multiple R-squared:  0.3011, Adjusted R-squared:  0.2944 
 +F-statistic: 45.49 on 5 and 528 DF,  p-value: < 2.2e-16
 +
 +> </code>
multicolinearity.1545759664.txt.gz · Last modified: 2018/12/26 02:41 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki