multiple_regression
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
multiple_regression [2019/05/23 10:20] – [Why overall model is significant while IVs are not?] hkimscil | multiple_regression [2024/09/30 07:36] (current) – [e.g.] hkimscil | ||
---|---|---|---|
Line 44: | Line 44: | ||
====== e.g.====== | ====== e.g.====== | ||
Data set again. | Data set again. | ||
+ | < | ||
+ | datavar <- read.csv(" | ||
^ DATA for regression analysis | ^ DATA for regression analysis | ||
Line 67: | Line 69: | ||
</ | </ | ||
+ | 아래는 분산을 (variance 혹은 MS) 구하는 과정이다. 표에서 error 컬럼은 개인점수를 평균으로 ($\overline{Y}=8$) 예측했을 때의 오차를 (error) 말한다. 그리고 이를 제곱하여 (error< | ||
^ prediction for y values with $\overline{Y}$ | ^ prediction for y values with $\overline{Y}$ | ||
| bankaccount | | bankaccount | ||
Line 166: | Line 169: | ||
====== e.g., ====== | ====== e.g., ====== | ||
DATA: \\ | DATA: \\ | ||
- | <wrap indent> | + | <wrap indent> |
+ | {{: | ||
+ | </ | ||
The Academic Performance Index (**API**) is a measurement of //academic performance and progress of individual schools in California, United States//. It is one of the main components of the Public Schools Accountability Act passed by the California legislature in 1999. API scores ranges from a low of 200 to a high of 1000. [[https:// | The Academic Performance Index (**API**) is a measurement of //academic performance and progress of individual schools in California, United States//. It is one of the main components of the Public Schools Accountability Act passed by the California legislature in 1999. API scores ranges from a low of 200 to a high of 1000. [[https:// | ||
Line 242: | Line 247: | ||
| | number of students | | | number of students | ||
| a. Dependent Variable: api 2000 ||||||| | | a. Dependent Variable: api 2000 ||||||| | ||
+ | |||
====== e.g., ====== | ====== e.g., ====== | ||
Line 328: | Line 334: | ||
</ | </ | ||
- | ====== Why overall model is significant while IVs are not? ====== | + | ===== in R ===== |
- | see https://www.researchgate.net/post/Why_is_the_Multiple_regression_model_not_significant_while_simple_regression_for_the_same_variables_is_significant | + | < |
+ | mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar) | ||
+ | summary(mod) | ||
+ | anova(mod) | ||
+ | </ | ||
< | < | ||
- | RSS = 3:10 #Right shoe size | + | dvar <- read.csv(" |
- | LSS = rnorm(RSS, RSS, 0.1) #Left shoe size - similar to RSS | + | > mod <- lm(api00 ~ ell + acs_k3 + avg_ed |
- | cor(LSS, RSS) # | + | > summary(mod) |
- | + | ||
- | weights = 120 + rnorm(RSS, 10*RSS, 10) | + | |
- | + | ||
- | ##Fit a joint model | + | |
- | m = lm(weights ~ LSS + RSS) | + | |
- | + | ||
- | ##F-value is very small, but neither LSS or RSS are significant | + | |
- | summary(m) | + | |
- | </code> | + | |
- | + | ||
- | + | ||
- | <code>> | + | |
- | > LSS = rnorm(RSS, RSS, 0.1) #Left shoe size - similar to RSS | + | |
- | > cor(LSS, RSS) # | + | |
- | [1] 0.9994836 | + | |
- | > | + | |
- | > weights = 120 + rnorm(RSS, 10*RSS, 10) | + | |
- | > | + | |
- | > ##Fit a joint model | + | |
- | > m = lm(weights ~ LSS + RSS) | + | |
- | > | + | |
- | > ##F-value is very small, but neither LSS or RSS are significant | + | |
- | > summary(m) | + | |
Call: | Call: | ||
- | lm(formula = weights | + | lm(formula = api00 ~ ell + acs_k3 + avg_ed + meals, data = dvar) |
Residuals: | Residuals: | ||
- | | + | |
- | 4.8544 4.5254 | + | -187.020 -40.358 -0.313 36.155 173.697 |
Coefficients: | Coefficients: | ||
Estimate Std. Error t value Pr(> | Estimate Std. Error t value Pr(> | ||
- | (Intercept) | + | (Intercept) |
- | LSS -14.162 | + | ell -0.8434 |
- | RSS 26.305 | + | acs_k3 |
+ | avg_ed | ||
+ | meals -2.9374 | ||
--- | --- | ||
Signif. codes: | Signif. codes: | ||
- | Residual standard error: | + | Residual standard error: |
- | Multiple R-squared: | + | (21 observations deleted due to missingness) |
- | F-statistic: | + | Multiple R-squared: |
+ | F-statistic: | ||
+ | > anova(mod) | ||
+ | Analysis of Variance Table | ||
+ | |||
+ | Response: api00 | ||
+ | | ||
+ | ell 1 4502711 4502711 1309.762 < 2.2e-16 *** | ||
+ | acs_k3 | ||
+ | avg_ed | ||
+ | meals | ||
+ | Residuals 374 1285740 | ||
+ | --- | ||
+ | Signif. codes: | ||
> | > | ||
- | > ##Fitting RSS or LSS separately gives a significant result. | + | </code> |
- | > summary(lm(weights ~ LSS)) | + | |
+ | < | ||
Call: | Call: | ||
- | lm(formula = weights | + | lm(formula = api00 ~ ell + acs_k3 + avg_ed + meals, data = dvar) |
- | + | ||
- | Residuals: | + | |
- | | + | |
- | -6.055 -4.930 -2.925 | + | |
Coefficients: | Coefficients: | ||
- | Estimate Std. Error t value Pr(> | + | (Intercept) |
- | (Intercept) | + | 709.6388 -0.8434 3.3884 29.0724 |
- | LSS | + | |
- | --- | + | |
- | Signif. codes: | + | |
- | Residual standard error: 7.026 on 6 degrees of freedom | + | ></ |
- | Multiple R-squared: | + | $$ \hat{Y} = 709.6388 + -0.8434 \text{ell} + 3.3884 \text{acs_k3} + 29.0724 \text{avg_ed} + -2.9374 \text{meals} \\$$ |
- | F-statistic: | + | |
- | > | + | 그렇다면 각각의 독립변인 고유의 설명력은 얼마인가? |
- | </ | + | |
Line 413: | Line 407: | ||
* Enter method (all at once as if they are not related) | * Enter method (all at once as if they are not related) | ||
* Selection methods | * Selection methods | ||
- | * [[: | + | * [[: |
* Forward selection: X변인들 (predictors) 중 종속변인인 Y와 상관관계가 가장 높은 변인부터 먼저 투입되어 회귀계산이 수행된다. 먼저 투입된 변인은 (상관관계가 높으므로) 이론적으로 종속변인을 설명하는 중요한 요소로 여겨지게 된다. 또한 다음 변인은 우선 투입된 변인을 고려한 상태로 투입된다. | * Forward selection: X변인들 (predictors) 중 종속변인인 Y와 상관관계가 가장 높은 변인부터 먼저 투입되어 회귀계산이 수행된다. 먼저 투입된 변인은 (상관관계가 높으므로) 이론적으로 종속변인을 설명하는 중요한 요소로 여겨지게 된다. 또한 다음 변인은 우선 투입된 변인을 고려한 상태로 투입된다. | ||
* Backward elimination: | * Backward elimination: | ||
Line 427: | Line 421: | ||
| | Standard Multiple | | | Standard Multiple | ||
- | | r< | + | | r< |
| ::: | IV< | | ::: | IV< | ||
- | | sr< | + | | sr< |
| ::: | IV< | | ::: | IV< | ||
- | | pr< | + | | pr< |
| ::: | IV< | | ::: | IV< | ||
| IV< | | IV< | ||
Line 454: | Line 448: | ||
Multicolliearity problem = when torelance < .01 or when VIF > 10 | Multicolliearity problem = when torelance < .01 or when VIF > 10 | ||
+ | ====== elem e.g. again ====== | ||
+ | < | ||
+ | dvar <- read.csv(" | ||
+ | mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar) | ||
+ | summary(mod) | ||
+ | anova(mod) | ||
+ | </ | ||
+ | < | ||
+ | dvar <- read.csv(" | ||
+ | > mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar) | ||
+ | > summary(mod) | ||
+ | Call: | ||
+ | lm(formula = api00 ~ ell + acs_k3 + avg_ed + meals, data = dvar) | ||
+ | Residuals: | ||
+ | | ||
+ | -187.020 | ||
+ | |||
+ | Coefficients: | ||
+ | Estimate Std. Error t value Pr(> | ||
+ | (Intercept) 709.6388 | ||
+ | ell -0.8434 | ||
+ | acs_k3 | ||
+ | avg_ed | ||
+ | meals -2.9374 | ||
+ | --- | ||
+ | Signif. codes: | ||
+ | |||
+ | Residual standard error: 58.63 on 374 degrees of freedom | ||
+ | (21 observations deleted due to missingness) | ||
+ | Multiple R-squared: | ||
+ | F-statistic: | ||
+ | |||
+ | > anova(mod) | ||
+ | Analysis of Variance Table | ||
+ | |||
+ | Response: api00 | ||
+ | | ||
+ | ell 1 4502711 4502711 1309.762 < 2.2e-16 *** | ||
+ | acs_k3 | ||
+ | avg_ed | ||
+ | meals | ||
+ | Residuals 374 1285740 | ||
+ | --- | ||
+ | Signif. codes: | ||
+ | > | ||
+ | </ | ||
+ | < | ||
+ | # install.packages(" | ||
+ | library(ppcor) | ||
+ | myvar <- data.frame(api00, | ||
+ | myvar <- na.omit(myvar) | ||
+ | spcor(myvar) | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | > library(ppcor) | ||
+ | > myvar <- data.frame(api00, | ||
+ | > myvar <- na.omit(myvar) | ||
+ | > spcor(myvar) | ||
+ | $estimate | ||
+ | | ||
+ | api00 | ||
+ | ell -0.13469956 | ||
+ | acs_k3 | ||
+ | avg_ed | ||
+ | meals -0.29972194 | ||
+ | |||
+ | $p.value | ||
+ | api00 ell acs_k3 | ||
+ | api00 0.000000e+00 0.07761805 0.5525340 0.085390280 2.403284e-10 | ||
+ | ell 8.918743e-03 0.00000000 0.2390272 0.232377348 1.558141e-03 | ||
+ | acs_k3 1.608778e-01 0.05998819 0.0000000 0.009891503 7.907183e-03 | ||
+ | avg_ed 1.912418e-02 0.27203887 0.1380449 0.000000000 7.424903e-05 | ||
+ | meals 3.041658e-09 0.04526574 0.2919775 0.006489783 0.000000e+00 | ||
+ | |||
+ | $statistic | ||
+ | | ||
+ | api00 | ||
+ | ell -2.628924 | ||
+ | acs_k3 | ||
+ | avg_ed | ||
+ | meals -6.075665 | ||
+ | |||
+ | $n | ||
+ | [1] 379 | ||
+ | |||
+ | $gp | ||
+ | [1] 3 | ||
+ | |||
+ | $method | ||
+ | [1] " | ||
+ | > | ||
+ | > | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | > spcor.test(myvar$api00, | ||
+ | estimate | ||
+ | 1 -0.3190889 2.403284e-10 -6.511264 379 3 pearson | ||
+ | > | ||
+ | </ | ||
====== e.g., ====== | ====== e.g., ====== | ||
[[:multiple regression examples]] | [[:multiple regression examples]] | ||
Line 481: | Line 576: | ||
* Income Income seven years after College (in thousands) | * Income Income seven years after College (in thousands) | ||
+ | ====== exercise ====== | ||
+ | {{: | ||
+ | < | ||
+ | dvar <- read.csv(" | ||
+ | </ | ||
+ | |||
+ | [[:Multiple Regression Exercise]] | ||
====== Resources ====== | ====== Resources ====== | ||
Line 502: | Line 604: | ||
* https:// | * https:// | ||
* http:// | * http:// | ||
+ | |||
+ | https:// | ||
+ | |||
+ | |||
{{tag> " | {{tag> " | ||
+ |
multiple_regression.1558574411.txt.gz · Last modified: 2019/05/23 10:20 by hkimscil