Differences

This shows you the differences between two versions of the page.

--- multiple_regression [2017/11/13 08:47] – [무엇부터? 라는 문제] hkimscil
+++ multiple_regression [2019/05/23 10:20] – [Why overall model is significant while IVs are not?] hkimscil
@@ Line 8: / Line 8: @@
 $$Y = a + bX$$
-라고 할때, 이 regression 공식에 대한 F 값이 통계적으로 유의미하다면, 이 값에 공헌하는 오직 하나의 변인인 X의 b값이 이에 대한 모든 것이기 때문이다. 그러나, 만약에 두 개 이상의 [[:Independent Variable|독립변인]]이 regression에 사용된다면 이야기가 달라진다.
+라고 할때, 이 regression 공식에 대한 F 값이 통계적으로 유의미하다면, 이 값에 공헌하는 오직 하나의 변인인 X의 b값이 이에 대한 모든 것이기 때문이다. 그러나, 만약에 두 개 이상의 [[:types_of_variables#independent|독립변인]]이 regression에 사용된다면 이야기가 달라진다.
 Multiple regression은 여러 연구학제에서 다양하게 사용된다, 예를 들면 Baldry ((Baldry, A. C. (2003). Bullying in schools and exposure to domestic violence. Child Abuse & Neglect, 27(7), 713-732. doi: doi: DOI: 10.1016/S0145-2134(03)00114-5. {{Bullying in schools and exposure to domestic violence.pdf}} ))는 Multiple regression방법을 이용하여 어린이들의 폭력적인 성향 (bully behavior)에 영향을 주는 요소(변인)들을 살펴보았다. Baldry는 [[:hierarchical regression]] 혹은 [[:sequential regression]]방법을 사용하여, 어린이의 폭력적인 행동을 설명하는 변인으로 나이와 성별(남자, 여자)를 우선 선택하였고, 두 번째 절차로 아버지의 어머니에 대한 언어적, 신체적 폭력성 (abuse, 어머니의 폭력성은 배제되었음)을 선택하였으며, 마지막으로 어머니의 아버지에 대한 언어적, 신체적 폭력성을 선택하여 단계적인 regression을 하였다. 그의 연구결과를 보면, 아버지의 폭력적인 성향은 아이의 폭력적 행위와 연관이 없었으며, 성별, 나이와 함께, 어머니의 폭력성이 더 아이들의 폭력적인 행동과 연관이 있었다. 위의 4가지 변인이 설명한 아이의 폭력적 행위에 대한 설명력은 14%에 그쳤다 (위의 문헌 참고).
@@ Line 150: / Line 150: @@
 |     |     |  B   |  Std. Error   |  Beta   |     |    |
 |  1.000    |  (Constant)   |  6.399    |  1.517    |     |  4.220    |  0.004   |
-|     |  bankIncome  income   |  0.012    |  0.004    |  0.616    |  3.325    |  0.013   |
+|     |  income   |  0.012    |  0.004    |  0.616    |  3.325    |  0.013   |
 |     |  bankfam   |  -0.545    |  0.226    |  -0.446    |  -2.406    |  0.047   |
 | a Dependent Variable: bankbook  number of bank   |||||||
-b에 대한 (coefficients) 유의도 테스트는 t-test를 이용하여 한다. 위의 표에서 . . . .
+====== Slope test ======
+b에 대한 (coefficients) 유의도 테스트는 t-test를 이용하여 한다. t-test는 기본적으로 트리트먼트효과 (독립변인효과 혹은 차이)를 랜덤에러인 standard error로 나누어서 구하므로, 위의 표에서 income에 대한 t value는 0.012/0.004; bankfam의 경우는 -0.545 / 0.226로 구할 수 있다.
+독립변인이 하나일 경우에 구한 t 값은 해당 리그레션 모델의 F test값의 제곱근을 씌운 값이 된다. 독립변인이 둘 이상인 경우에는 독립변인 간의 상관관계가 존재하는 경우가 대다수이므로 t 값의 제곱이 꼭 F 값이 되지는 않는다.
+====== Beta coefficients ======
+[[:beta coefficients]] 혹은 Standardized coefficients 참조
 ====== e.g., ======
@@ Line 321: / Line 328: @@
 </code>
-====== 무엇부터? 라는 문제 ======
+====== Why overall model is significant while IVs are not? ======
+see https://www.researchgate.net/post/Why_is_the_Multiple_regression_model_not_significant_while_simple_regression_for_the_same_variables_is_significant
+<code>
+RSS = 3:10 #Right shoe size
+LSS = rnorm(RSS, RSS, 0.1) #Left shoe size - similar to RSS
+cor(LSS, RSS) #correlation ~ 0.99
+weights = 120 + rnorm(RSS, 10*RSS, 10)
+##Fit a joint model
+m = lm(weights ~ LSS + RSS)
+##F-value is very small, but neither LSS or RSS are significant
+summary(m)
+</code>
+<code>> RSS = 3:10 #Right shoe size
+> LSS = rnorm(RSS, RSS, 0.1) #Left shoe size - similar to RSS
+> cor(LSS, RSS) #correlation ~ 0.99
+[1] 0.9994836
+>
+> weights = 120 + rnorm(RSS, 10*RSS, 10)
+>
+> ##Fit a joint model
+> m = lm(weights ~ LSS + RSS)
+>
+> ##F-value is very small, but neither LSS or RSS are significant
+> summary(m)
+Call:
+lm(formula = weights ~ LSS + RSS)
+Residuals:
+       2       3       4       5       6       7       8
+.8544  4.5254 -3.6333 -7.6402 -0.2467 -3.1997 -5.2665 10.6066
+Coefficients:
+            Estimate Std. Error t value Pr(>|t|)
+(Intercept)  104.842      8.169  12.834 5.11e-05 ***
+LSS          -14.162     35.447  -0.400    0.706
+RSS           26.305     35.034   0.751    0.487
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+Residual standard error: 7.296 on 5 degrees of freedom
+Multiple R-squared:  0.9599,	Adjusted R-squared:  0.9439
+F-statistic: 59.92 on 2 and 5 DF,  p-value: 0.000321
+>
+> ##Fitting RSS or LSS separately gives a significant result.
+> summary(lm(weights ~ LSS))
+Call:
+lm(formula = weights ~ LSS)
+Residuals:
+   Min     1Q Median     3Q    Max
+-6.055 -4.930 -2.925  4.886 11.854
+Coefficients:
+            Estimate Std. Error t value Pr(>|t|)
+(Intercept)  103.099      7.543   13.67 9.53e-06 ***
+LSS           12.440      1.097   11.34 2.81e-05 ***
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+Residual standard error: 7.026 on 6 degrees of freedom
+Multiple R-squared:  0.9554,	Adjusted R-squared:  0.948
+F-statistic: 128.6 on 1 and 6 DF,  p-value: 2.814e-05
+>
+</code>
+====== The problem of "which one is entered first?" ======
 __그림 여기쯤 수록__
@@ Line 339: / Line 422: @@
     * . . . the stepwise procedure defines an a posteriori order based solely on a statistical consideration (the statistical significance of semi-partial correlations) . . . .
 ====== Determining IVs' role ======
+For a complete explanation and examples, read [[:partial  and semipartial correlation]]
 https://www.youtube.com/watch?v=-QsMvrQDxyU
 [{{ :partial.correlations.jpg?300 |r-squared semi-partial partial correlations }}]
@@ Line 396: / Line 480: @@
   * LifeSat Score on Life Satisfaction Inventory seven years after College
   * Income Income seven years after College (in thousands)
 ====== Resources ======