User Tools

Site Tools


r:linear_regression

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
r:linear_regression [2018/06/15 08:02] – [Multiple Regression] hkimscilr:linear_regression [2019/06/13 10:15] (current) hkimscil
Line 51: Line 51:
 Does the model fit the data well? Does the model fit the data well?
   * **Plot the residuals** and check the regression diagnostics.   * **Plot the residuals** and check the regression diagnostics.
 +    * see [[https://drsimonj.svbtle.com/visualising-residuals|visualizing residuals]]
 Does the data satisfy the assumptions behind linear regression? Does the data satisfy the assumptions behind linear regression?
   * Check whether the diagnostics confirm that a linear model is reasonable for your data.   * Check whether the diagnostics confirm that a linear model is reasonable for your data.
Line 146: Line 147:
 </WRAP> </WRAP>
  
 +<WRAP info>
 +What about beta coefficient?
 +
 +<blockquote>In R we demonstrate the use of the lm.beta() function in the QuantPsyc package (due to Thomas D. Fletcher of State Farm). The function is short and sweet, and takes a linear model object as argument:</blockquote>
 +
 +<code>> lm.beta(mod)
 +EngineSize 
 +-0.7100032 
 +> cor(MPG.city,EngineSize)
 +[1] -0.7100032
 +
 +</code></WRAP>
  
  
 ====== Multiple Regression ====== ====== Multiple Regression ======
 +regression output table
 | anova(m)  | ANOVA table  | | anova(m)  | ANOVA table  |
 | coefficients(m) = coef(m)  | Model coefficients  | | coefficients(m) = coef(m)  | Model coefficients  |
Line 184: Line 198:
 F-statistic: 54.83 on 2 and 90 DF,  p-value: 2.674e-16 F-statistic: 54.83 on 2 and 90 DF,  p-value: 2.674e-16
 </code> </code>
-Questions: +<WRAP box help>Questions: 
   * What is R<sup>2</sup>?   * What is R<sup>2</sup>?
   * How many cars are involved in this test? (cf. df = 90)   * How many cars are involved in this test? (cf. df = 90)
 +    * df + # of variables involved (3) = 93
 +    * check 'str(Cars93)'
   * If I eliminate the R<sup>2</sup> from the above output, can you still identify what it is?   * If I eliminate the R<sup>2</sup> from the above output, can you still identify what it is?
 +</WRAP>
 +<WRAP box info>The last question: 
 +  * If I eliminate the R<sup>2</sup> from the above output, can you still identify what it is?
 +  * to answer the question, use the regression output table:
 +
 +R<sup>2</sup> = SS<sub>reg</sub>/SS<sub>total</sub> = 1 - SS<sub>res</sub>/SS<sub>total</sub>
 +
 +
 +<code>> anova(lm.model)
 +Analysis of Variance Table
 +
 +Response: Cars93$MPG.city
 +                  Df Sum Sq Mean Sq F value  Pr(>F)    
 +Cars93$EngineSize  1   1465    1465  100.65 2.4e-16 ***
 +Cars93$Price          131     131    9.01  0.0035 ** 
 +Residuals         90   1310      15                    
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +> sstotal = 1465+131+1310
 +> ssreg <- 1465+131
 +> ssreg/sstotal
 +[1] 0.54921
 +
 +> # or
 +> 1-(deviance(lm.model)/sstotal)
 +[1] 0.54932
 +</code>
 +
 +
 +</WRAP>
  
 Regression formula: Regression formula:
Line 193: Line 240:
   * $\hat{Y} = \widehat{\text{MPG.city}}$   * $\hat{Y} = \widehat{\text{MPG.city}}$
  
-<code>plot(lm.model$residuals)</code>+<WRAP box info>in the meantime, 
 +<code>> lm.beta(lm.model) 
 +Cars93$EngineSize      Cars93$Price  
 +       -0.5517121        -0.2649553  
 +> cor(MPG.city,EngineSize) 
 +[1] -0.7100032 
 +> cor(EngineSize,Price) 
 +[1] 0.5974254 
 +> cor(MPG.city,Price) 
 +[1] -0.5945622 
 +>  
 +</code> 
 +Or . . . .  
 +<code>> temp <- subset(Cars93, select=c(MPG.city,EngineSize,Price)) 
 +> temp 
 +. . . .  
 + 
 +> cor(temp) 
 +             MPG.city EngineSize      Price 
 +MPG.city    1.0000000 -0.7100032 -0.5945622 
 +EngineSize -0.7100032  1.0000000  0.5974254 
 +Price      -0.5945622  0.5974254  1.0000000 
 +>  
 +</code> 
 +Beta coefficients are not equal to correlations among variables.  
 +</WRAP> 
 + 
 +<code>plot(lm.model$residuals) 
 +abline(h=0, col="red"
 +</code>
  
 <code>anova(lm.model) <code>anova(lm.model)
Line 205: Line 281:
 --- ---
 Signif. codes:   Signif. codes:  
-0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</code>+0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 +</code> 
 + 
 +Why use anova with lm output (lm.model in this case)?
  
 <code>coef(lm.model) <code>coef(lm.model)
Line 315: Line 394:
  
  
-#predict the fall enrollment (ROLL)  +predict the fall enrollment (ROLL)using  
-using the unemployment rate (UNEM) and  +  * the unemployment rate (UNEM) and  
-number of spring high school graduates (HGRAD)+  number of spring high school graduates (HGRAD)
  
 <code> <code>
Line 357: Line 436:
 </code> </code>
  
-<code>y =  -8255.7511  +  698.2681*UNEM  +  0.9423*HGRAD  </code>+<code> 
 +y =  -8255.7511  +  698.2681*UNEM  +  0.9423*HGRAD   
 +</code>
  
 Q: what is the expected fall enrollment (ROLL) given this year's unemployment rate (UNEM) of 9% and spring high school graduating class (HGRAD) of 100,000? Q: what is the expected fall enrollment (ROLL) given this year's unemployment rate (UNEM) of 9% and spring high school graduating class (HGRAD) of 100,000?
Line 369: Line 450:
 **92258** students. **92258** students.
  
-Enrollment 와 Unemployment, Highschool student grateduates, Income 간의 관계 +Enrollment 와 Unemployment, Highschool student grateduates, Income 간의 관계. 즉, 
-<code>threePredictorModel <- lm(ROLL ~ UNEM + HGRAD + INC, data) +  * dv = ROLL (Enrollment) 
-></code+  * iv = UNEM, HGRAD, INC 
 + 
 +<code>threePredictorModel <- lm(ROLL ~ UNEM + HGRAD + INC, data)</code>
  
 <code>threePredictorModel <code>threePredictorModel
Line 406: Line 489:
 F-statistic: 211.5 on 3 and 25 DF,  p-value: < 2.2e-16 F-statistic: 211.5 on 3 and 25 DF,  p-value: < 2.2e-16
  
-></code>+</code>
  
 How to get **beta coefficients**((beta weights, beta values)) for predictor variables? How to get **beta coefficients**((beta weights, beta values)) for predictor variables?
Line 420: Line 503:
  
 How to compare each model (with incremental IVs) How to compare each model (with incremental IVs)
-<codeanova(onePredictorModel, twoPredictorModel, threePredictorModel)+<code> 
 +anova(onePredictorModel, twoPredictorModel, threePredictorModel)
  
 Analysis of Variance Table Analysis of Variance Table
Line 442: Line 526:
 # Import data (simulated data for this example) # Import data (simulated data for this example)
 myData <- read.csv('http://static.lib.virginia.edu/statlab/materials/data/hierarchicalRegressionData.csv') myData <- read.csv('http://static.lib.virginia.edu/statlab/materials/data/hierarchicalRegressionData.csv')
 +# or 
 +# myData <- read.csv('http://commres.net/wiki/_media/r/hierarchicalregressiondata.csv')
  
-# Build models+# Build models to compare the adding variables are worthwhile. 
 m0 <- lm(happiness ~ 1, data=myData)  # to obtain Total SS m0 <- lm(happiness ~ 1, data=myData)  # to obtain Total SS
 m1 <- lm(happiness ~ age + gender, data=myData)  # Model 1 m1 <- lm(happiness ~ age + gender, data=myData)  # Model 1
Line 457: Line 543:
 Residuals 99 240.84  2.4327   Residuals 99 240.84  2.4327  
 </code> </code>
- 
-<code>anova(m1, m2, m3)  # model comparison 
-Analysis of Variance Table 
- 
-Model 1: happiness ~ age + gender 
-Model 2: happiness ~ age + gender + friends 
-Model 3: happiness ~ age + gender + friends + pets 
-  Res.Df    RSS Df Sum of Sq          Pr(>F)     
-1     97 233.97                                    
-2     96 209.27  1    24.696 12.1293 0.0007521 *** 
-3     95 193.42  1    15.846  7.7828 0.0063739 **  
---- 
-Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
-</code> 
- 
-  * Model 0: SS<sub>Total</sub= 240.84 (no predictors) 
-  * Model 1: SS<sub>Residual</sub= 233.97 (after adding age and gender) 
-  * Model 2: SS<sub>Residual</sub= 209.27,  
-    * SS<sub>Difference</sub= 233.97 - 209.27 = 24.696,  
-    * FF(1,96) = 12.1293, pp = 0.0007521 (after adding friends) 
-  * Model 3: SS<sub>Residual</sub= 193.42,  
-    * SS<sub>Difference</sub= 209.27 - 193.42 = 15.846,  
-    * FF(1,95) = 7.7828, pp = 0.0063739 (after adding pets) 
  
 <code>summary(m1) <code>summary(m1)
Line 494: Line 557:
 Multiple R-squared:  0.02855, Adjusted R-squared:  0.008515  Multiple R-squared:  0.02855, Adjusted R-squared:  0.008515 
 F-statistic: 1.425 on 2 and 97 DF,  p-value: 0.2455 F-statistic: 1.425 on 2 and 97 DF,  p-value: 0.2455
 +</code> 
 +<code>
 summary(m2) summary(m2)
  
Line 509: Line 573:
 Multiple R-squared:  0.1311, Adjusted R-squared:  0.1039  Multiple R-squared:  0.1311, Adjusted R-squared:  0.1039 
 F-statistic: 4.828 on 3 and 96 DF,  p-value: 0.003573 F-statistic: 4.828 on 3 and 96 DF,  p-value: 0.003573
 +</code>
  
 +<code>
 summary(m3) summary(m3)
  
Line 527: Line 593:
  
 </code> </code>
 +
 +
 +<code>
 +> lm.beta(m3)
 +        age  genderMale     friends        pets 
 +-0.14098154 -0.04484095  0.28909280  0.27446786 
 +</code>
 +
 +<code>
 +anova(m0,m1,m2,m3)
 +Analysis of Variance Table
 +
 +Model 1: happiness ~ 1
 +Model 2: happiness ~ age + gender
 +Model 3: happiness ~ age + gender + friends
 +Model 4: happiness ~ age + gender + friends + pets
 +  Res.Df    RSS Df Sum of Sq          Pr(>F)    
 +1     99 240.84                                   
 +2     97 233.97  2    6.8748  1.6883 0.1903349    
 +3     96 209.27  1   24.6957 12.1293 0.0007521 ***
 +4     95 193.42  1   15.8461  7.7828 0.0063739 ** 
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +</code>
 +
 +  * Model 0: SS<sub>Total</sub>= 240.84 (no predictors)
 +  * Model 1: SS<sub>Residual</sub>= 233.97 (after adding age and gender)
 +  * Model 2: SS<sub>Residual</sub>= 209.27, 
 +    * SS<sub>Difference</sub>= 233.97 - 209.27 = 24.696, 
 +    * F(1,96) = 12.1293, p value = 0.0007521 (after adding friends)
 +  * Model 3: SS<sub>Residual</sub>= 193.42, 
 +    * SS<sub>Difference</sub>= 209.27 - 193.42 = 15.846, 
 +    * F(1,95) = 7.7828, p value = 0.0063739 (after adding pets)
 +
 +
  
 {{https://data.library.virginia.edu/files/Park.png}} {{https://data.library.virginia.edu/files/Park.png}}
r/linear_regression.1529017351.txt.gz · Last modified: 2018/06/15 08:02 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki