r:linear_regression
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| r:linear_regression [2018/06/15 08:02] – [Multiple Regression] hkimscil | r:linear_regression [2019/06/13 10:15] (current) – hkimscil | ||
|---|---|---|---|
| Line 51: | Line 51: | ||
| Does the model fit the data well? | Does the model fit the data well? | ||
| * **Plot the residuals** and check the regression diagnostics. | * **Plot the residuals** and check the regression diagnostics. | ||
| + | * see [[https:// | ||
| Does the data satisfy the assumptions behind linear regression? | Does the data satisfy the assumptions behind linear regression? | ||
| * Check whether the diagnostics confirm that a linear model is reasonable for your data. | * Check whether the diagnostics confirm that a linear model is reasonable for your data. | ||
| Line 146: | Line 147: | ||
| </ | </ | ||
| + | <WRAP info> | ||
| + | What about beta coefficient? | ||
| + | |||
| + | < | ||
| + | |||
| + | < | ||
| + | EngineSize | ||
| + | -0.7100032 | ||
| + | > cor(MPG.city, | ||
| + | [1] -0.7100032 | ||
| + | > | ||
| + | </ | ||
| ====== Multiple Regression ====== | ====== Multiple Regression ====== | ||
| + | regression output table | ||
| | anova(m) | | anova(m) | ||
| | coefficients(m) = coef(m) | | coefficients(m) = coef(m) | ||
| Line 184: | Line 198: | ||
| F-statistic: | F-statistic: | ||
| </ | </ | ||
| - | Questions: | + | <WRAP box help>Questions: |
| * What is R< | * What is R< | ||
| * How many cars are involved in this test? (cf. df = 90) | * How many cars are involved in this test? (cf. df = 90) | ||
| + | * df + # of variables involved (3) = 93 | ||
| + | * check ' | ||
| * If I eliminate the R< | * If I eliminate the R< | ||
| + | </ | ||
| + | <WRAP box info>The last question: | ||
| + | * If I eliminate the R< | ||
| + | * to answer the question, use the regression output table: | ||
| + | |||
| + | R< | ||
| + | = | ||
| + | |||
| + | < | ||
| + | Analysis of Variance Table | ||
| + | |||
| + | Response: Cars93$MPG.city | ||
| + | Df Sum Sq Mean Sq F value Pr(> | ||
| + | Cars93$EngineSize | ||
| + | Cars93$Price | ||
| + | Residuals | ||
| + | --- | ||
| + | Signif. codes: | ||
| + | |||
| + | > sstotal = 1465+131+1310 | ||
| + | > ssreg <- 1465+131 | ||
| + | > ssreg/ | ||
| + | [1] 0.54921 | ||
| + | > | ||
| + | > # or | ||
| + | > 1-(deviance(lm.model)/ | ||
| + | [1] 0.54932 | ||
| + | </ | ||
| + | |||
| + | |||
| + | </ | ||
| Regression formula: | Regression formula: | ||
| Line 193: | Line 240: | ||
| * $\hat{Y} = \widehat{\text{MPG.city}}$ | * $\hat{Y} = \widehat{\text{MPG.city}}$ | ||
| - | < | + | <WRAP box info>in the meantime, |
| + | < | ||
| + | Cars93$EngineSize | ||
| + | | ||
| + | > cor(MPG.city, | ||
| + | [1] -0.7100032 | ||
| + | > cor(EngineSize, | ||
| + | [1] 0.5974254 | ||
| + | > cor(MPG.city, | ||
| + | [1] -0.5945622 | ||
| + | > | ||
| + | </ | ||
| + | Or . . . . | ||
| + | < | ||
| + | > temp | ||
| + | . . . . | ||
| + | |||
| + | > cor(temp) | ||
| + | | ||
| + | MPG.city | ||
| + | EngineSize -0.7100032 | ||
| + | Price -0.5945622 | ||
| + | > | ||
| + | </ | ||
| + | Beta coefficients are not equal to correlations among variables. | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | abline(h=0, col=" | ||
| + | </ | ||
| < | < | ||
| Line 205: | Line 281: | ||
| --- | --- | ||
| Signif. codes: | Signif. codes: | ||
| - | 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1</ | + | 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 |
| + | </ | ||
| + | |||
| + | Why use anova with lm output (lm.model in this case)? | ||
| < | < | ||
| Line 315: | Line 394: | ||
| - | #predict the fall enrollment (ROLL) | + | predict the fall enrollment (ROLL): using |
| - | using the unemployment rate (UNEM) and | + | * the unemployment rate (UNEM) and |
| - | number of spring high school graduates (HGRAD) | + | |
| < | < | ||
| Line 357: | Line 436: | ||
| </ | </ | ||
| - | < | + | < |
| + | y = -8255.7511 | ||
| + | </ | ||
| Q: what is the expected fall enrollment (ROLL) given this year's unemployment rate (UNEM) of 9% and spring high school graduating class (HGRAD) of 100,000? | Q: what is the expected fall enrollment (ROLL) given this year's unemployment rate (UNEM) of 9% and spring high school graduating class (HGRAD) of 100,000? | ||
| Line 369: | Line 450: | ||
| **92258** students. | **92258** students. | ||
| - | Enrollment 와 Unemployment, | + | Enrollment 와 Unemployment, |
| - | < | + | * dv = ROLL (Enrollment) |
| - | ></code | + | * iv = UNEM, HGRAD, INC |
| + | |||
| + | < | ||
| < | < | ||
| Line 406: | Line 489: | ||
| F-statistic: | F-statistic: | ||
| - | ></ | + | </ |
| How to get **beta coefficients**((beta weights, beta values)) for predictor variables? | How to get **beta coefficients**((beta weights, beta values)) for predictor variables? | ||
| Line 420: | Line 503: | ||
| How to compare each model (with incremental IVs) | How to compare each model (with incremental IVs) | ||
| - | <codeanova(onePredictorModel, | + | <code> |
| + | anova(onePredictorModel, | ||
| Analysis of Variance Table | Analysis of Variance Table | ||
| Line 442: | Line 526: | ||
| # Import data (simulated data for this example) | # Import data (simulated data for this example) | ||
| myData <- read.csv(' | myData <- read.csv(' | ||
| + | # or | ||
| + | # myData <- read.csv(' | ||
| - | # Build models | + | # Build models |
| m0 <- lm(happiness ~ 1, data=myData) | m0 <- lm(happiness ~ 1, data=myData) | ||
| m1 <- lm(happiness ~ age + gender, data=myData) | m1 <- lm(happiness ~ age + gender, data=myData) | ||
| Line 457: | Line 543: | ||
| Residuals 99 240.84 | Residuals 99 240.84 | ||
| </ | </ | ||
| - | |||
| - | < | ||
| - | Analysis of Variance Table | ||
| - | |||
| - | Model 1: happiness ~ age + gender | ||
| - | Model 2: happiness ~ age + gender + friends | ||
| - | Model 3: happiness ~ age + gender + friends + pets | ||
| - | Res.Df | ||
| - | 1 97 233.97 | ||
| - | 2 96 209.27 | ||
| - | 3 95 193.42 | ||
| - | --- | ||
| - | Signif. codes: | ||
| - | </ | ||
| - | |||
| - | * Model 0: SS< | ||
| - | * Model 1: SS< | ||
| - | * Model 2: SS< | ||
| - | * SS< | ||
| - | * FF(1,96) = 12.1293, pp = 0.0007521 (after adding friends) | ||
| - | * Model 3: SS< | ||
| - | * SS< | ||
| - | * FF(1,95) = 7.7828, pp = 0.0063739 (after adding pets) | ||
| < | < | ||
| Line 494: | Line 557: | ||
| Multiple R-squared: | Multiple R-squared: | ||
| F-statistic: | F-statistic: | ||
| + | </ | ||
| + | < | ||
| summary(m2) | summary(m2) | ||
| Line 509: | Line 573: | ||
| Multiple R-squared: | Multiple R-squared: | ||
| F-statistic: | F-statistic: | ||
| + | </ | ||
| + | < | ||
| summary(m3) | summary(m3) | ||
| Line 527: | Line 593: | ||
| </ | </ | ||
| + | |||
| + | |||
| + | < | ||
| + | > lm.beta(m3) | ||
| + | age genderMale | ||
| + | -0.14098154 -0.04484095 | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | anova(m0, | ||
| + | Analysis of Variance Table | ||
| + | |||
| + | Model 1: happiness ~ 1 | ||
| + | Model 2: happiness ~ age + gender | ||
| + | Model 3: happiness ~ age + gender + friends | ||
| + | Model 4: happiness ~ age + gender + friends + pets | ||
| + | Res.Df | ||
| + | 1 99 240.84 | ||
| + | 2 97 233.97 | ||
| + | 3 96 209.27 | ||
| + | 4 95 193.42 | ||
| + | --- | ||
| + | Signif. codes: | ||
| + | </ | ||
| + | |||
| + | * Model 0: SS< | ||
| + | * Model 1: SS< | ||
| + | * Model 2: SS< | ||
| + | * SS< | ||
| + | * F(1,96) = 12.1293, p value = 0.0007521 (after adding friends) | ||
| + | * Model 3: SS< | ||
| + | * SS< | ||
| + | * F(1,95) = 7.7828, p value = 0.0063739 (after adding pets) | ||
| + | |||
| + | |||
| {{https:// | {{https:// | ||
r/linear_regression.1529017351.txt.gz · Last modified: by hkimscil
