Differences

This shows you the differences between two versions of the page.

--- multiple_regression [2020/12/01 16:23] – [elem e.g. again] hkimscil
+++ multiple_regression [2023/10/19 08:39] (current) – [Determining IVs' role] hkimscil
@@ Line 44: / Line 44: @@
 ====== e.g.======
 Data set again.
+<code>
+datavar <- read.csv("http://commres.net/wiki/_media/regression01-bankaccount.csv") </code>
 ^  DATA for regression analysis   ^^^
@@ Line 332: / Line 334: @@
 ===== in R =====
-<code>dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv", fileEncoding="UTF-8-BOM")
+<code>dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv", sep = "\t", fileEncoding="UTF-8-BOM")
 mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar)
 summary(mod)
@@ Line 380: / Line 382: @@
 </code>
+<code>> mod
-====== Why overall model is significant while IVs are not? ======
-see https://www.researchgate.net/post/Why_is_the_Multiple_regression_model_not_significant_while_simple_regression_for_the_same_variables_is_significant
-<code>
-RSS = 3:10 #Right shoe size
-LSS = rnorm(RSS, RSS, 0.1) #Left shoe size - similar to RSS
-cor(LSS, RSS) #correlation ~ 0.99
-weights = 120 + rnorm(RSS, 10*RSS, 10)
-##Fit a joint model
-m = lm(weights ~ LSS + RSS)
-##F-value is very small, but neither LSS or RSS are significant
-summary(m)
-</code>
-<code>> RSS = 3:10 #Right shoe size
-> LSS = rnorm(RSS, RSS, 0.1) #Left shoe size - similar to RSS
-> cor(LSS, RSS) #correlation ~ 0.99
-[1] 0.9994836
->
-> weights = 120 + rnorm(RSS, 10*RSS, 10)
->
-> ##Fit a joint model
-> m = lm(weights ~ LSS + RSS)
->
-> ##F-value is very small, but neither LSS or RSS are significant
-> summary(m)
 Call:
-lm(formula = weights ~ LSS + RSS)
+lm(formula = api00 ~ ell + acs_k3 + avg_ed + meals, data = dvar)
-Residuals:
-       2       3       4       5       6       7       8
-.8544  4.5254 -3.6333 -7.6402 -0.2467 -3.1997 -5.2665 10.6066
 Coefficients:
-            Estimate Std. Error t value Pr(>|t|)
+(Intercept)          ell       acs_k3       avg_ed        meals
-(Intercept)  104.842      8.169  12.834 5.11e-05 ***
+.6388      -0.8434       3.3884      29.0724      -2.9374
-LSS          -14.162     35.447  -0.400    0.706
-RSS           26.305     35.034   0.751    0.487
----
-Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-Residual standard error: 7.296 on 5 degrees of freedom
+></code>
-Multiple R-squared:  0.9599,	Adjusted R-squared:  0.9439
+$$ \hat{Y} =  709.6388 + -0.8434 \text{ell} + 3.3884 \text{acs_k3} + 29.0724 \text{avg_ed} + -2.9374 \text{meals} \\$$
-F-statistic: 59.92 on 2 and 5 DF,  p-value: 0.000321
->
+그렇다면 각각의 독립변인 고유의 설명력은 얼마인가? --> see [[:partial and semipartial correlation]]
-> ##Fitting RSS or LSS separately gives a significant result.
-> summary(lm(weights ~ LSS))
-Call:
-lm(formula = weights ~ LSS)
-Residuals:
-   Min     1Q Median     3Q    Max
--6.055 -4.930 -2.925  4.886 11.854
-Coefficients:
-            Estimate Std. Error t value Pr(>|t|)
-(Intercept)  103.099      7.543   13.67 9.53e-06 ***
-LSS           12.440      1.097   11.34 2.81e-05 ***
----
-Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-Residual standard error: 7.026 on 6 degrees of freedom
-Multiple R-squared:  0.9554,	Adjusted R-squared:  0.948
-F-statistic: 128.6 on 1 and 6 DF,  p-value: 2.814e-05
->
-</code>
@@ Line 480: / Line 420: @@
 |  | Standard Multiple   | Sequential   |  comments   |
-| r<sub>i</sub><sup>2</sup>  \\ squared correlation \\ **zero-order** correlation   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | overlapped effects   |
+| r<sub>i</sub><sup>2</sup>  \\ squared correlation \\ squared **zero-order** \\ correlation in spss  | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | overlapped effects   |
 | ::: | IV<sub>2</sub> : (c+b) / (a+b+c+d)   | IV<sub>2</sub>: (c+b) / (a+b+c+d)   | ::: |
-| sr<sub>i</sub><sup>2</sup>  \\ squared **semipartial** correlation \\ **part in spss**   | IV<sub>1</sub> : (a) / (a+b+c+d)   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | Usual setting \\ Unique contribution to Y   |
+| sr<sub>i</sub><sup>2</sup>  \\ squared \\ **semipartial** correlation \\ **part in spss**   | IV<sub>1</sub> : (a) / (a+b+c+d)   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | Usual setting \\ Unique contribution to Y   |
 | ::: | IV<sub>2</sub> : %%(c%%) / (a+b+c+d)   | IV<sub>2</sub> : %%(c%%) / (a+b+c+d)   | ::: |
-| pr<sub>i</sub><sup>2</sup>  \\ squared **partial** correlation \\ **partial in spss**   | IV<sub>1</sub> : (a) / (a+d)   | IV<sub>1</sub> : (a+b) / (a+b+d)   | Like adjusted r<sup>2</sup>  \\ Unique contribution to Y   |
+| pr<sub>i</sub><sup>2</sup>  \\ squared \\ **partial** correlation \\ **partial in spss**   | IV<sub>1</sub> : (a) / (a+d)   | IV<sub>1</sub> : (a+b) / (a+b+d)   | Like adjusted r<sup>2</sup>  \\ Unique contribution to Y   |
 | ::: | IV<sub>2</sub> : %%(c%%) / (c+d)   | IV<sub>2</sub> : %%(c%%) / (c+d)   | ::: |
 | IV<sub>1</sub> 이 IV<sub>2</sub> 보다 먼저 투입되었을 때를 가정   ||||
@@ Line 513: / Line 453: @@
 summary(mod)
 anova(mod)
+</code>
+<code>
+dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv", fileEncoding="UTF-8-BOM")
+> mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar)
+> summary(mod)
+Call:
+lm(formula = api00 ~ ell + acs_k3 + avg_ed + meals, data = dvar)
+Residuals:
+     Min       1Q   Median       3Q      Max
+-187.020  -40.358   -0.313   36.155  173.697
+Coefficients:
+            Estimate Std. Error t value Pr(>|t|)
+(Intercept) 709.6388    56.2401  12.618  < 2e-16 ***
+ell          -0.8434     0.1958  -4.307 2.12e-05 ***
+acs_k3        3.3884     2.3333   1.452    0.147
+avg_ed       29.0724     6.9243   4.199 3.36e-05 ***
+meals        -2.9374     0.1948 -15.081  < 2e-16 ***
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+Residual standard error: 58.63 on 374 degrees of freedom
+  (21 observations deleted due to missingness)
+Multiple R-squared:  0.8326,	Adjusted R-squared:  0.8308
+F-statistic:   465 on 4 and 374 DF,  p-value: < 2.2e-16
+> anova(mod)
+Analysis of Variance Table
+Response: api00
+           Df  Sum Sq Mean Sq  F value    Pr(>F)
+ell         1 4502711 4502711 1309.762 < 2.2e-16 ***
+acs_k3      1  110211  110211   32.059 2.985e-08 ***
+avg_ed      1  998892  998892  290.561 < 2.2e-16 ***
+meals       1  781905  781905  227.443 < 2.2e-16 ***
+Residuals 374 1285740    3438
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+>
+</code>
+<code>
 # install.packages("ppcor")
 library(ppcor)
@@ Line 559: / Line 541: @@
 [1] "pearson"
 >
+>
+</code>
+<code>
+> spcor.test(myvar$api00, myvar$meals, myvar[,c(2,3,4)])
+    estimate      p.value statistic   n gp  Method
+-0.3190889 2.403284e-10 -6.511264 379  3 pearson
 >
 </code>
@@ Line 592: / Line 581: @@
 </code>
+[[:Multiple Regression Exercise]]
 ====== Resources ======