User Tools

Site Tools


r:linear_regression

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
r:linear_regression [2018/06/15 08:39] – [e.g. 3] hkimscilr:linear_regression [2019/06/13 10:15] (current) hkimscil
Line 51: Line 51:
 Does the model fit the data well? Does the model fit the data well?
   * **Plot the residuals** and check the regression diagnostics.   * **Plot the residuals** and check the regression diagnostics.
 +    * see [[https://drsimonj.svbtle.com/visualising-residuals|visualizing residuals]]
 Does the data satisfy the assumptions behind linear regression? Does the data satisfy the assumptions behind linear regression?
   * Check whether the diagnostics confirm that a linear model is reasonable for your data.   * Check whether the diagnostics confirm that a linear model is reasonable for your data.
Line 146: Line 147:
 </WRAP> </WRAP>
  
 +<WRAP info>
 +What about beta coefficient?
 +
 +<blockquote>In R we demonstrate the use of the lm.beta() function in the QuantPsyc package (due to Thomas D. Fletcher of State Farm). The function is short and sweet, and takes a linear model object as argument:</blockquote>
 +
 +<code>> lm.beta(mod)
 +EngineSize 
 +-0.7100032 
 +> cor(MPG.city,EngineSize)
 +[1] -0.7100032
 +
 +</code></WRAP>
  
  
 ====== Multiple Regression ====== ====== Multiple Regression ======
 +regression output table
 | anova(m)  | ANOVA table  | | anova(m)  | ANOVA table  |
 | coefficients(m) = coef(m)  | Model coefficients  | | coefficients(m) = coef(m)  | Model coefficients  |
Line 184: Line 198:
 F-statistic: 54.83 on 2 and 90 DF,  p-value: 2.674e-16 F-statistic: 54.83 on 2 and 90 DF,  p-value: 2.674e-16
 </code> </code>
-Questions: +<WRAP box help>Questions: 
   * What is R<sup>2</sup>?   * What is R<sup>2</sup>?
   * How many cars are involved in this test? (cf. df = 90)   * How many cars are involved in this test? (cf. df = 90)
 +    * df + # of variables involved (3) = 93
 +    * check 'str(Cars93)'
   * If I eliminate the R<sup>2</sup> from the above output, can you still identify what it is?   * If I eliminate the R<sup>2</sup> from the above output, can you still identify what it is?
 +</WRAP>
 +<WRAP box info>The last question: 
 +  * If I eliminate the R<sup>2</sup> from the above output, can you still identify what it is?
 +  * to answer the question, use the regression output table:
 +
 +R<sup>2</sup> = SS<sub>reg</sub>/SS<sub>total</sub> = 1 - SS<sub>res</sub>/SS<sub>total</sub>
 +
 +
 +<code>> anova(lm.model)
 +Analysis of Variance Table
 +
 +Response: Cars93$MPG.city
 +                  Df Sum Sq Mean Sq F value  Pr(>F)    
 +Cars93$EngineSize  1   1465    1465  100.65 2.4e-16 ***
 +Cars93$Price          131     131    9.01  0.0035 ** 
 +Residuals         90   1310      15                    
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +> sstotal = 1465+131+1310
 +> ssreg <- 1465+131
 +> ssreg/sstotal
 +[1] 0.54921
 +
 +> # or
 +> 1-(deviance(lm.model)/sstotal)
 +[1] 0.54932
 +</code>
 +
 +
 +</WRAP>
  
 Regression formula: Regression formula:
   * $\hat{Y} = -2.99 * EngineSize + -0.15 * Price +  33.35$   * $\hat{Y} = -2.99 * EngineSize + -0.15 * Price +  33.35$
   * $\hat{Y} = \widehat{\text{MPG.city}}$   * $\hat{Y} = \widehat{\text{MPG.city}}$
 +
 +<WRAP box info>in the meantime,
 +<code>> lm.beta(lm.model)
 +Cars93$EngineSize      Cars93$Price 
 +       -0.5517121        -0.2649553 
 +> cor(MPG.city,EngineSize)
 +[1] -0.7100032
 +> cor(EngineSize,Price)
 +[1] 0.5974254
 +> cor(MPG.city,Price)
 +[1] -0.5945622
 +
 +</code>
 +Or . . . . 
 +<code>> temp <- subset(Cars93, select=c(MPG.city,EngineSize,Price))
 +> temp
 +. . . . 
 +
 +> cor(temp)
 +             MPG.city EngineSize      Price
 +MPG.city    1.0000000 -0.7100032 -0.5945622
 +EngineSize -0.7100032  1.0000000  0.5974254
 +Price      -0.5945622  0.5974254  1.0000000
 +
 +</code>
 +Beta coefficients are not equal to correlations among variables. 
 +</WRAP>
  
 <code>plot(lm.model$residuals) <code>plot(lm.model$residuals)
Line 469: Line 543:
 Residuals 99 240.84  2.4327   Residuals 99 240.84  2.4327  
 </code> </code>
- 
-<code>anova(m1, m2, m3)  # model comparison 
-Analysis of Variance Table 
- 
-Model 1: happiness ~ age + gender 
-Model 2: happiness ~ age + gender + friends 
-Model 3: happiness ~ age + gender + friends + pets 
-  Res.Df    RSS Df Sum of Sq          Pr(>F)     
-1     97 233.97                                    
-2     96 209.27  1    24.696 12.1293 0.0007521 *** 
-3     95 193.42  1    15.846  7.7828 0.0063739 **  
---- 
-Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
-</code> 
- 
-  * Model 0: SS<sub>Total</sub>= 240.84 (no predictors) 
-  * Model 1: SS<sub>Residual</sub>= 233.97 (after adding age and gender) 
-  * Model 2: SS<sub>Residual</sub>= 209.27,  
-    * SS<sub>Difference</sub>= 233.97 - 209.27 = 24.696,  
-    * F(1,96) = 12.1293, p value = 0.0007521 (after adding friends) 
-  * Model 3: SS<sub>Residual</sub>= 193.42,  
-    * SS<sub>Difference</sub>= 209.27 - 193.42 = 15.846,  
-    * F(1,95) = 7.7828, p value = 0.0063739 (after adding pets) 
  
 <code>summary(m1) <code>summary(m1)
Line 506: Line 557:
 Multiple R-squared:  0.02855, Adjusted R-squared:  0.008515  Multiple R-squared:  0.02855, Adjusted R-squared:  0.008515 
 F-statistic: 1.425 on 2 and 97 DF,  p-value: 0.2455 F-statistic: 1.425 on 2 and 97 DF,  p-value: 0.2455
 +</code> 
 +<code>
 summary(m2) summary(m2)
  
Line 521: Line 573:
 Multiple R-squared:  0.1311, Adjusted R-squared:  0.1039  Multiple R-squared:  0.1311, Adjusted R-squared:  0.1039 
 F-statistic: 4.828 on 3 and 96 DF,  p-value: 0.003573 F-statistic: 4.828 on 3 and 96 DF,  p-value: 0.003573
 +</code>
  
 +<code>
 summary(m3) summary(m3)
  
Line 539: Line 593:
  
 </code> </code>
 +
 +
 +<code>
 +> lm.beta(m3)
 +        age  genderMale     friends        pets 
 +-0.14098154 -0.04484095  0.28909280  0.27446786 
 +</code>
 +
 +<code>
 +anova(m0,m1,m2,m3)
 +Analysis of Variance Table
 +
 +Model 1: happiness ~ 1
 +Model 2: happiness ~ age + gender
 +Model 3: happiness ~ age + gender + friends
 +Model 4: happiness ~ age + gender + friends + pets
 +  Res.Df    RSS Df Sum of Sq          Pr(>F)    
 +1     99 240.84                                   
 +2     97 233.97  2    6.8748  1.6883 0.1903349    
 +3     96 209.27  1   24.6957 12.1293 0.0007521 ***
 +4     95 193.42  1   15.8461  7.7828 0.0063739 ** 
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +</code>
 +
 +  * Model 0: SS<sub>Total</sub>= 240.84 (no predictors)
 +  * Model 1: SS<sub>Residual</sub>= 233.97 (after adding age and gender)
 +  * Model 2: SS<sub>Residual</sub>= 209.27, 
 +    * SS<sub>Difference</sub>= 233.97 - 209.27 = 24.696, 
 +    * F(1,96) = 12.1293, p value = 0.0007521 (after adding friends)
 +  * Model 3: SS<sub>Residual</sub>= 193.42, 
 +    * SS<sub>Difference</sub>= 209.27 - 193.42 = 15.846, 
 +    * F(1,95) = 7.7828, p value = 0.0063739 (after adding pets)
 +
 +
  
 {{https://data.library.virginia.edu/files/Park.png}} {{https://data.library.virginia.edu/files/Park.png}}
r/linear_regression.1529019540.txt.gz · Last modified: 2018/06/15 08:39 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki