Differences

This shows you the differences between two versions of the page.

--- partial_and_semipartial_correlation [2019/10/13 21:04] – [regression gpa against sat] hkimscil
+++ partial_and_semipartial_correlation [2019/11/27 15:40] – [Partial and semi-partial correlation] hkimscil
@@ Line 2: / Line 2: @@
 references
 {{https://web.stanford.edu/~hastie/Papers/ESLII.pdf|The Elements of Statistical Learning}} or local copy
+[{{  :pasted:20191127-150222.png?250}}]
 Simple explanation of the below procedures is like this:
   * Separately regress Y and X1 against X2, that is,
@@ Line 10: / Line 9: @@
   * Regress the Y residuals against the X1 residuals.
 In the below example,
-  * regress gpa against sat
+  * regress gpa against sat (and get residuals of gpa = a + b)
-  * regress clep against sat
+  * regress clep against sat (and get residuals of clep = b + c)
-  * regress the gpa residuals against clep residuals.
+  * regress the gpa residuals against clep residuals. (''%%lm(a+b~b+c)%%'')
-Take a close look at the graphs, especially, the grey areas.
+  * In this case, $r^{2} = \displaystyle \frac{b}{(a+b)}$ and $b$ is very small.
+Take a close look at the right graph, especially, the ''%%b%%'' areas although clep's is significantly explains gpa (before controlling sat).
 For more, see https://stats.stackexchange.com/questions/28474/how-can-adding-a-2nd-iv-make-the-1st-iv-significant
@@ Line 85: / Line 88: @@
 linear model
 ''y hat = 0.0024 X + 1.7848''
+''gpa hat = 0.0024 sat + 1.7848''
@@ Line 115: / Line 119: @@
 550 2.9 3.13544 -0.23544
 >
-> round(cor(cor.gpa.sat),3)
+round(cor(cor.gpa.sat),4)
         sat   gpa  pred resid
 sat   1.000 0.718 1.000 0.000
@@ Line 122: / Line 126: @@
 resid 0.000 0.696 0.000 1.000
 >
-> </code>
+</code>
 Note that
-  * r (sat and gpa) = .718 (sqrt(r<sup>2</sup>=0.5156)
+  * r (sat and gpa) = .718 (sqrt(r<sup>2</sup>)=0.5156)
   * r (sat and pred) = 1. In other words, predicted values (y hats) are the linear function of x (sat) values (''y hat = 0.0024 X + 1.7848'').
   * r (sat and resid) = 0. residuals are orthogonal to the independent (sat) values.
@@ Line 155: / Line 159: @@
 Residual standard error: 0.1637 on 8 degrees of freedom
 Multiple R-squared:  0.7679,	Adjusted R-squared:  0.7388
-F-statistic: 26.46 on 1 and 8 DF,  p-value: 0.0008808</code>
+F-statistic: 26.46 on 1 and 8 DF,  p-value: 0.0008808
+</code>
+''y hat = 0.06054 * clep + 1.17438''
 <code>
@@ Line 161: / Line 170: @@
 res.lm.gpa.clep <- lm.gpa.clep$residuals
 </code>
 {{lm.gpa.clep.png?500}}
 <code>
 # get cor between gpa, sat, pred, and resid from. lm.gpa.clep
-cor.gpa.clep <- as.data.frame(cbind(gpa, clep, lm.gpa.clep$fitted.values, lm.gpa.clep$residuals))
+cor.gpa.clep <- as.data.frame(cbind(clep, gpa, lm.gpa.clep$fitted.values, lm.gpa.clep$residuals))
-colnames(cor.gpa.clep) <- c("gpa", "clep", "pred", "resid")
+colnames(cor.gpa.clep) <- c("clep", "gpa", "pred", "resid")
 cor(cor.gpa.clep)
 </code>
-<code>         gpa   clep   pred  resid
+<code>
-gpa   1.0000 0.8763 0.8763 0.4818
+> round(cor(cor.gpa.clep),4)
-clep  0.8763 1.0000 1.0000 0.0000
+        clep    gpa   pred  resid
-pred  0.8763 1.0000 1.0000 0.0000
+clep  1.0000 0.8763 1.0000 0.0000
-resid 0.4818 0.0000 0.0000 1.0000
+gpa   0.8763 1.0000 0.8763 0.4818
-> </code>
+pred  1.0000 0.8763 1.0000 0.0000
+resid 0.0000 0.4818 0.0000 1.0000
+>
+        sat   gpa  pred resid
+sat   1.0000 0.7180 1.0000 0.0000
+gpa   0.7180 1.0000 0.7180 0.6960
+pred  1.0000 0.7180 1.0000 0.0000
+resid 0.0000 0.6960 0.0000 1.0000
+>
+</code>
@@ Line 205: / Line 226: @@
 >
 </code>
+''Multiple R-squared:  0.7778''
+''F (2, 7) = 12.25, p = 0.005157 ''
+''intercept 1.1607560 p = 0.0249 ''
+''clep 0.0729294  p = 0.0239''
+''sat 0.0007015  p = 0.5940 ''
+One other thing that we could do help determine a pragmatic argument is to regress GPA on both SAT and CLEP at the same time to see what happens. If we do that, we find that R-square for the model is .78, F = 12.25, p < .01. The intercept and b weight for CLEP are both significant, but the b weight for SAT is not significant. The values are
+  * ''Intercept = 1.16, t=2.844, p < .05''
+  * ''CLEP = 0.07, t=2.874, p < .05''
+  * ''SATQ = -.0007, t=-0.558, n.s.''
+In this case, we would conclude that the significant unique predictor is CLEP. Although SAT is highly correlated with GPA, it adds nothing to the prediction equation once the CLEP score is entered. (These data are fictional and the sample size is much too small to run this analysis. It's there for illustration only.)
+Now suppose we wanted to argue something a little different. Suppose we had a theory that said that all measures of math achievement share a common explanation, which is math ability. In other words, the reason that various (all) math achievement tests are correlated is that they share the math ability factor. In other words, math ability explains the correlation between achievement tests. In path diagram form, we might represent this something like this:
 ===== checking partial cor 1 =====
 <code>