User Tools

Site Tools


r:linear_regression

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
r:linear_regression [2018/06/15 07:52] hkimscilr:linear_regression [2018/12/07 10:58] – [Multiple Regression] hkimscil
Line 51: Line 51:
 Does the model fit the data well? Does the model fit the data well?
   * **Plot the residuals** and check the regression diagnostics.   * **Plot the residuals** and check the regression diagnostics.
 +    * see [[https://drsimonj.svbtle.com/visualising-residuals|visualizing residuals]]
 Does the data satisfy the assumptions behind linear regression? Does the data satisfy the assumptions behind linear regression?
   * Check whether the diagnostics confirm that a linear model is reasonable for your data.   * Check whether the diagnostics confirm that a linear model is reasonable for your data.
Line 146: Line 147:
 </WRAP> </WRAP>
  
 +<WRAP info>
 +What about beta coefficient?
 +
 +<blockquote>In R we demonstrate the use of the lm.beta() function in the QuantPsyc package (due to Thomas D. Fletcher of State Farm). The function is short and sweet, and takes a linear model object as argument:</blockquote>
 +
 +<code>> lm.beta(mod)
 +EngineSize 
 +-0.7100032 
 +> cor(MPG.city,EngineSize)
 +[1] -0.7100032
 +
 +</code></WRAP>
  
  
 ====== Multiple Regression ====== ====== Multiple Regression ======
 +regression output table
 | anova(m)  | ANOVA table  | | anova(m)  | ANOVA table  |
 | coefficients(m) = coef(m)  | Model coefficients  | | coefficients(m) = coef(m)  | Model coefficients  |
Line 162: Line 176:
 <code>lm.model <- lm(Cars93$MPG.city ~ Cars93$EngineSize + Cars93$Price) <code>lm.model <- lm(Cars93$MPG.city ~ Cars93$EngineSize + Cars93$Price)
 summary(lm.model) summary(lm.model)
 +</code> 
 +<code>
 Call: Call:
 lm(formula = Cars93$MPG.city ~ Cars93$EngineSize + Cars93$Price) lm(formula = Cars93$MPG.city ~ Cars93$EngineSize + Cars93$Price)
Line 183: Line 198:
 F-statistic: 54.83 on 2 and 90 DF,  p-value: 2.674e-16 F-statistic: 54.83 on 2 and 90 DF,  p-value: 2.674e-16
 </code> </code>
 +<WRAP box help>Questions: 
 +  * What is R<sup>2</sup>?
 +  * How many cars are involved in this test? (cf. df = 90)
 +  * If I eliminate the R<sup>2</sup> from the above output, can you still identify what it is?
 +</WRAP>
 +<WRAP box info>The last question: 
 +  * If I eliminate the R<sup>2</sup> from the above output, can you still identify what it is?
 +  * to answer the question, use the regression output table:
  
-$\hat{Y} = -2.99 * EngineSize + -0.15 * Price +  33.35$ +R<sup>2</sup> = SS<sub>reg</sub>/SS<sub>total</sub> = 1 SS<sub>res</sub>/SS<sub>total</sub> 
-$\hat{Y} \text{MPG.city}$+
  
-<code>plot(lm.model$residuals)</code>+<code>> anova(lm.model) 
 +Analysis of Variance Table 
 + 
 +Response: Cars93$MPG.city 
 +                  Df Sum Sq Mean Sq F value  Pr(>F)     
 +Cars93$EngineSize  1   1465    1465  100.65 2.4e-16 *** 
 +Cars93$Price          131     131    9.01  0.0035 **  
 +Residuals         90   1310      15                     
 +--- 
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 + 
 +> sstotal = 1465+131+1310 
 +> ssreg <- 1465+131 
 +> ssreg/sstotal 
 +[1] 0.54921 
 +>  
 +> # or 
 +> 1-(deviance(lm.model)/sstotal) 
 +[1] 0.54932 
 +</code> 
 + 
 + 
 +</WRAP> 
 + 
 +Regression formula: 
 +  * $\hat{Y} = -2.99 * EngineSize + -0.15 * Price +  33.35$ 
 +  * $\hat{Y} = \widehat{\text{MPG.city}}$ 
 + 
 +<WRAP box info>in the meantime, 
 +<code>> lm.beta(lm.model) 
 +Cars93$EngineSize      Cars93$Price  
 +       -0.5517121        -0.2649553  
 +> cor(MPG.city,EngineSize) 
 +[1] -0.7100032 
 +> cor(EngineSize,Price) 
 +[1] 0.5974254 
 +> cor(MPG.city,Price) 
 +[1] -0.5945622 
 +>  
 +</code> 
 +Or . . . .  
 +<code>> temp <- subset(Cars93, select=c(MPG.city,EngineSize,Price)) 
 +> temp 
 +. . . .  
 + 
 +> cor(temp) 
 +             MPG.city EngineSize      Price 
 +MPG.city    1.0000000 -0.7100032 -0.5945622 
 +EngineSize -0.7100032  1.0000000  0.5974254 
 +Price      -0.5945622  0.5974254  1.0000000 
 +>  
 +</code> 
 +Beta coefficients are not equal to correlations among variables.  
 +</WRAP> 
 + 
 +<code>plot(lm.model$residuals) 
 +abline(h=0, col="red"
 +</code>
  
 <code>anova(lm.model) <code>anova(lm.model)
Line 199: Line 279:
 --- ---
 Signif. codes:   Signif. codes:  
-0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</code>+0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 +</code> 
 + 
 +Why use anova with lm output (lm.model in this case)?
  
 <code>coef(lm.model) <code>coef(lm.model)
Line 309: Line 392:
  
  
-#predict the fall enrollment (ROLL)  +predict the fall enrollment (ROLL)using  
-using the unemployment rate (UNEM) and  +  * the unemployment rate (UNEM) and  
-number of spring high school graduates (HGRAD)+  number of spring high school graduates (HGRAD)
  
 <code> <code>
Line 351: Line 434:
 </code> </code>
  
-<code>y =  -8255.7511  +  698.2681*UNEM  +  0.9423*HGRAD  </code>+<code> 
 +y =  -8255.7511  +  698.2681*UNEM  +  0.9423*HGRAD   
 +</code>
  
 Q: what is the expected fall enrollment (ROLL) given this year's unemployment rate (UNEM) of 9% and spring high school graduating class (HGRAD) of 100,000? Q: what is the expected fall enrollment (ROLL) given this year's unemployment rate (UNEM) of 9% and spring high school graduating class (HGRAD) of 100,000?
Line 363: Line 448:
 **92258** students. **92258** students.
  
-Enrollment 와 Unemployment, Highschool student grateduates, Income 간의 관계 +Enrollment 와 Unemployment, Highschool student grateduates, Income 간의 관계. 즉, 
-<code>threePredictorModel <- lm(ROLL ~ UNEM + HGRAD + INC, data) +  * dv = ROLL (Enrollment) 
-></code+  * iv = UNEM, HGRAD, INC 
 + 
 +<code>threePredictorModel <- lm(ROLL ~ UNEM + HGRAD + INC, data)</code>
  
 <code>threePredictorModel <code>threePredictorModel
Line 400: Line 487:
 F-statistic: 211.5 on 3 and 25 DF,  p-value: < 2.2e-16 F-statistic: 211.5 on 3 and 25 DF,  p-value: < 2.2e-16
  
-></code>+</code>
  
 How to get **beta coefficients**((beta weights, beta values)) for predictor variables? How to get **beta coefficients**((beta weights, beta values)) for predictor variables?
Line 414: Line 501:
  
 How to compare each model (with incremental IVs) How to compare each model (with incremental IVs)
-<codeanova(onePredictorModel, twoPredictorModel, threePredictorModel)+<code> 
 +anova(onePredictorModel, twoPredictorModel, threePredictorModel)
  
 Analysis of Variance Table Analysis of Variance Table
Line 436: Line 524:
 # Import data (simulated data for this example) # Import data (simulated data for this example)
 myData <- read.csv('http://static.lib.virginia.edu/statlab/materials/data/hierarchicalRegressionData.csv') myData <- read.csv('http://static.lib.virginia.edu/statlab/materials/data/hierarchicalRegressionData.csv')
 +# or 
 +# myData <- read.csv('http://commres.net/wiki/_media/r/hierarchicalregressiondata.csv')
  
-# Build models+# Build models to compare the adding variables are worthwhile. 
 m0 <- lm(happiness ~ 1, data=myData)  # to obtain Total SS m0 <- lm(happiness ~ 1, data=myData)  # to obtain Total SS
 m1 <- lm(happiness ~ age + gender, data=myData)  # Model 1 m1 <- lm(happiness ~ age + gender, data=myData)  # Model 1
Line 451: Line 541:
 Residuals 99 240.84  2.4327   Residuals 99 240.84  2.4327  
 </code> </code>
- 
-<code>anova(m1, m2, m3)  # model comparison 
-Analysis of Variance Table 
- 
-Model 1: happiness ~ age + gender 
-Model 2: happiness ~ age + gender + friends 
-Model 3: happiness ~ age + gender + friends + pets 
-  Res.Df    RSS Df Sum of Sq          Pr(>F)     
-1     97 233.97                                    
-2     96 209.27  1    24.696 12.1293 0.0007521 *** 
-3     95 193.42  1    15.846  7.7828 0.0063739 **  
---- 
-Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
-</code> 
- 
-  * Model 0: SS<sub>Total</sub= 240.84 (no predictors) 
-  * Model 1: SS<sub>Residual</sub= 233.97 (after adding age and gender) 
-  * Model 2: SS<sub>Residual</sub= 209.27,  
-    * SS<sub>Difference</sub= 233.97 - 209.27 = 24.696,  
-    * FF(1,96) = 12.1293, pp = 0.0007521 (after adding friends) 
-  * Model 3: SS<sub>Residual</sub= 193.42,  
-    * SS<sub>Difference</sub= 209.27 - 193.42 = 15.846,  
-    * FF(1,95) = 7.7828, pp = 0.0063739 (after adding pets) 
  
 <code>summary(m1) <code>summary(m1)
Line 488: Line 555:
 Multiple R-squared:  0.02855, Adjusted R-squared:  0.008515  Multiple R-squared:  0.02855, Adjusted R-squared:  0.008515 
 F-statistic: 1.425 on 2 and 97 DF,  p-value: 0.2455 F-statistic: 1.425 on 2 and 97 DF,  p-value: 0.2455
 +</code> 
 +<code>
 summary(m2) summary(m2)
  
Line 503: Line 571:
 Multiple R-squared:  0.1311, Adjusted R-squared:  0.1039  Multiple R-squared:  0.1311, Adjusted R-squared:  0.1039 
 F-statistic: 4.828 on 3 and 96 DF,  p-value: 0.003573 F-statistic: 4.828 on 3 and 96 DF,  p-value: 0.003573
 +</code>
  
 +<code>
 summary(m3) summary(m3)
  
Line 521: Line 591:
  
 </code> </code>
 +
 +
 +<code>
 +> lm.beta(m3)
 +        age  genderMale     friends        pets 
 +-0.14098154 -0.04484095  0.28909280  0.27446786 
 +</code>
 +
 +<code>
 +anova(m0,m1,m2,m3)
 +Analysis of Variance Table
 +
 +Model 1: happiness ~ 1
 +Model 2: happiness ~ age + gender
 +Model 3: happiness ~ age + gender + friends
 +Model 4: happiness ~ age + gender + friends + pets
 +  Res.Df    RSS Df Sum of Sq          Pr(>F)    
 +1     99 240.84                                   
 +2     97 233.97  2    6.8748  1.6883 0.1903349    
 +3     96 209.27  1   24.6957 12.1293 0.0007521 ***
 +4     95 193.42  1   15.8461  7.7828 0.0063739 ** 
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +</code>
 +
 +  * Model 0: SS<sub>Total</sub>= 240.84 (no predictors)
 +  * Model 1: SS<sub>Residual</sub>= 233.97 (after adding age and gender)
 +  * Model 2: SS<sub>Residual</sub>= 209.27, 
 +    * SS<sub>Difference</sub>= 233.97 - 209.27 = 24.696, 
 +    * F(1,96) = 12.1293, p value = 0.0007521 (after adding friends)
 +  * Model 3: SS<sub>Residual</sub>= 193.42, 
 +    * SS<sub>Difference</sub>= 209.27 - 193.42 = 15.846, 
 +    * F(1,95) = 7.7828, p value = 0.0063739 (after adding pets)
 +
 +
  
 {{https://data.library.virginia.edu/files/Park.png}} {{https://data.library.virginia.edu/files/Park.png}}
r/linear_regression.txt · Last modified: 2019/06/13 10:15 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki