Differences

This shows you the differences between two versions of the page.

--- partial_and_semipartial_correlation [2019/10/21 00:53] – [regression gpa against sat] hkimscil
+++ partial_and_semipartial_correlation [2024/06/12 08:01] (current) – [IV 간의 correlation이 심할 때 Regression 결과의 오류] hkimscil
@@ Line 2: / Line 2: @@
 references
 {{https://web.stanford.edu/~hastie/Papers/ESLII.pdf|The Elements of Statistical Learning}} or local copy
+or [[https://www.r-bloggers.com/2016/06/introduction-to-r-for-data-science-session-7-multiple-linear-regression-model-in-r-categorical-predictors-partial-and-part-correlation/| Introduction to R for Data Science :: Session 7 (Multiple Linear Regression Model in R  + Categorical Predictors, Partial and Part Correlation)]]
+====== Partial and semipartial ======
+<code>
+options(digits = 4)
+HSGPA <- c(3.0, 3.2, 2.8, 2.5, 3.2, 3.8, 3.9, 3.8, 3.5, 3.1)
+FGPA <-  c(2.8, 3.0, 2.8, 2.2, 3.3, 3.3, 3.5, 3.7, 3.4, 2.9)
+SATV <-  c(500, 550, 450, 400, 600, 650, 700, 550, 650, 550)
+scholar <- data.frame(FGPA, HSGPA, SATV) # collect into a data frame
+describe(scholar) # provides descrptive information about each variable
-Simple explanation of the below procedures is like this:
+corrs <- cor(scholar) # find the correlations and set them into an object called 'corrs'
-  * Separately regress Y and X1 against X2, that is,
+corrs                 # print corrs
-    * regress Y against X2 AND
-    * regress X1 against X2.
-  * Regress the Y residuals against the X1 residuals.
-In the below example,
-  * regress gpa against sat
-  * regress clep against sat
-  * regress the gpa residuals against clep residuals.
-Take a close look at the graphs, especially, the grey areas.
-For more, see https://stats.stackexchange.com/questions/28474/how-can-adding-a-2nd-iv-make-the-1st-iv-significant
+pairs(scholar)        # pairwise scatterplots
+</code>
-  - sat, clep이 각각 (gpa에 대한) regression에 사용되었을 때에 이 둘의 영향력이 나타난다. 그러나, 이 둘을 동시에 같이 사용했을 때에는 sat의 영향력이 사라지게 된다. 따라서 clep에 대해 gpa를 regression 했을 때를 제어하는 것을 보여주기로 한다.
+<code>
-  - lm.gpa.clep을 ((분석결과 변인의 이름은 "분석방법.종속변인.독립변인"으로 한다)) 얻고 res.lm.gpa.clep ((r 에서 다음과 같이 얻는다. <code>res.lm.gpa.clep  <- residuals(lm.gpa.clep)</code> 혹은 <code>res.lm.gpa.clep <-  lm.gpa.clep$residuals</code> ))을 구하면 clep의 영향력 부분을 제외한 나머지가 된다.
+attach(scholar)
-  - lm.gpa.clepsat 를 구해서 sat의 영향력이 사라지는 것을 본다.
+# freshman's gpa ~ hischool gpa + sat
-  - <del>그리고, res.lm.gpa.clep을 종속변인으로 하고 sat를 독립변인으로 한 영향력을 보면 clep을 제어한 후 sat만의 영향력을 보는 것이 된다 (pcor.test(gpa,sat,clep, data=tests)와 동일해야)</del>
+mod.all <- lm(FGPA ~ HSGPA+ SATV, data = scholar)
-  - sat에는 clep과 관련이 있는 (상관관계) 부분이 포함되기에 바로 위의 것은 이루어질 수 없다.
+summary(mod.all)
-  - 만약에 독립변인인 sat와 clep이 orthogonal하다면 (즉, 상관관계가 0이라면), 스트라이크 아웃된 부분이 가능하겠지만 그렇지 않기에 sat에서 clep의 부분을 제거한 부분을 구해서 즉, res.lm.sat.clep을 구해서 res.lm.gpa.clep와의 상관관계를 본다.
+</code>
+<code>> mod.all <- lm(FGPA ~ HSGPA+ SATV, data = scholar)
+> summary(mod.all)
-====== Partial cor ======
+Call:
-please refer to the page: http://faculty.cas.usf.edu/mbrannick/regression/Partial.html
+lm(formula = FGPA ~ HSGPA + SATV, data = scholar)
-{{:r:tests_cor.csv}}
-| Person  | SAT-Q  | CLEP  | Math GPA  |
-| 1  | 500  | 30  | 2.8  |
-| 2  | 550  | 32  | 3  |
-| 3  | 450  | 28  | 2.9  |
-| 4  | 400  | 25  | 2.8  |
-| 5  | 600  | 32  | 3.3  |
-| 6  | 650  | 38  | 3.3  |
-| 7  | 700  | 39  | 3.5  |
-| 8  | 550  | 38  | 3.7  |
-| 9  | 650  | 35  | 3.4  |
-| 10  | 550  | 31  | 2.9  |
-<code>
+Residuals:
+    Min      1Q  Median      3Q     Max
+-0.2431 -0.1125 -0.0286  0.1269  0.2716
-# import test score data "tests_cor.csv"
+Coefficients:
-tests <- read.csv("http://commres.net/wiki/_media/r/tests_cor.csv")
+            Estimate Std. Error t value Pr(>|t|)
-colnames(tests) <- c("ser", "sat", "clep", "gpa")
+(Intercept) 0.233102   0.456379    0.51    0.625
-tests <- subset(tests, select=c("sat", "clep", "gpa"))
+HSGPA       0.845192   0.283816    2.98    0.021 *
-attach(tests)
+SATV        0.000151   0.001405    0.11    0.917
-cors <- cor(tests)
+---
-round(cors,3)
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-</code>
-<code>       sat  clep   gpa
+Residual standard error: 0.192 on 7 degrees of freedom
-sat  1.000 0.875 0.718
+Multiple R-squared:  0.851,	Adjusted R-squared:  0.809
-clep 0.875 1.000 0.876
+F-statistic: 20.1 on 2 and 7 DF,  p-value: 0.00126
-gpa  0.718 0.876 1.000
+>
 > </code>
-===== regression gpa against sat =====
+[{{:pasted:20201203-233528.png?600}}]
+SATV 는 significant한 역할을 하지 못한다는 t-test 결과이다. 이것이 사실일까?
+우선 FGPA에 대해 SATV와 HSGPA를 가지고 regression을 각각 해보자
+아래의 결과는 두개의 IV는 각각 종속변인인 FGPA를 significant하게 설명하고 있다.
+즉, 둘이 같이 설명하려고 했을 때에만 그 설명력이 사라진다.
+<WRAP clear/>
+<code>
+attach(scholar)
+ma1 <- lm(FGPA ~ SATV)
+ma2 <- lm(FGPA ~ HSGPA)
+summary(ma1)
+summary(ma2)
+</code>
-<code>> lm.gpa.sat <- lm(gpa ~ sat, data = tests)
+<code>> ma1 <- lm(FGPA ~ SATV)
-> summary(lm.gpa.sat)
+> ma2 <- lm(FGPA ~ HSGPA)
+> summary(ma1)
 Call:
-lm(formula = gpa ~ sat, data = tests)
+lm(formula = FGPA ~ SATV)
 Residuals:
-     Min       1Q   Median       3Q      Max
+    Min      1Q  Median      3Q     Max
--0.23544 -0.12184  0.00316  0.02943  0.56456
+-0.2804 -0.1305 -0.0566  0.0350  0.6481
 Coefficients:
-             Estimate Std. Error t value Pr(>|t|)
+            Estimate Std. Error t value Pr(>|t|)
-(Intercept) 1.7848101  0.4771715   3.740   0.0057 **
+(Intercept)  0.95633    0.54419    1.76   0.1169
-sat         0.0024557  0.0008416   2.918   0.0193 *
+SATV         0.00381    0.00096    3.97   0.0041 **
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-Residual standard error: 0.2365 on 8 degrees of freedom
+Residual standard error: 0.27 on 8 degrees of freedom
-Multiple R-squared:  0.5156,	Adjusted R-squared:  0.455
+Multiple R-squared:  0.663,	Adjusted R-squared:  0.621
-F-statistic: 8.515 on 1 and 8 DF,  p-value: 0.01935
+F-statistic: 15.8 on 1 and 8 DF,  p-value: 0.00412
->
+> summary(ma2)
-> </code>
-linear model
+Call:
-''y hat = 0.0024 X + 1.7848''
+lm(formula = FGPA ~ HSGPA)
-''gpa hat = 0.0024 sat + 1.7848''
+Residuals:
+    Min      1Q  Median      3Q     Max
+-0.2434 -0.1094 -0.0266  0.1259  0.2797
+Coefficients:
+            Estimate Std. Error t value Pr(>|t|)
+(Intercept)    0.230      0.426    0.54  0.60412
+HSGPA          0.872      0.129    6.77  0.00014 ***
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-{{lm.gpa.sat.png?400}}
+Residual standard error: 0.179 on 8 degrees of freedom
-Note that .718 = correlation coefficient of sat and gpa.
+Multiple R-squared:  0.851,	Adjusted R-squared:  0.833
-<code>> sqrt(0.5156)
+F-statistic: 45.8 on 1 and 8 DF,  p-value: 0.000143
-[1] 0.7180529</code>
-Collect
-  - sat,
-  - gpa,
-  - predicted value (y hat),
-  - residuals (error)
-And see correlation among themselves.
-<code>
-> cor.gpa.sat <- as.data.frame(cbind(sat, gpa, lm.gpa.sat$fitted.values, lm.gpa.sat$residuals))
-> colnames(cor.gpa.sat) <- c("sat", "gpa", "pred", "resid")
-> round(cor.gpa.sat,5)
-   sat gpa    pred    resid
-  500 2.8 3.01266 -0.21266
-  550 3.0 3.13544 -0.13544
-  450 2.9 2.88987  0.01013
-  400 2.8 2.76709  0.03291
-  600 3.3 3.25823  0.04177
-  650 3.3 3.38101 -0.08101
-  700 3.5 3.50380 -0.00380
-  550 3.7 3.13544  0.56456
-  650 3.4 3.38101  0.01899
-550 2.9 3.13544 -0.23544
->
-round(cor(cor.gpa.sat),4)
-        sat   gpa  pred resid
-sat   1.000 0.718 1.000 0.000
-gpa   0.718 1.000 0.718 0.696
-pred  1.000 0.718 1.000 0.000
-resid 0.000 0.696 0.000 1.000
 >
 </code>
-Note that
-  * r (sat and gpa) = .718 (sqrt(r<sup>2</sup>)=0.5156)
+아래는 HSGPA의 영향력을 IV, DV 모두에게서 제거한 후, DV에 대한 IV의 영향력을 보는 작업이다.
-  * r (sat and pred) = 1. In other words, predicted values (y hats) are the linear function of x (sat) values (''y hat = 0.0024 X + 1.7848'').
-  * r (sat and resid) = 0. residuals are orthogonal to the independent (sat) values.
-===== regression gpa against clep =====
-<code># import test score data "tests_cor.csv"
-tests <- read.csv("http://commres.net/wiki/_media/r/tests_cor.csv")
-colnames(tests) <- c("ser", "sat", "clep", "gpa")
-tests <- subset(tests, select=c("sat", "clep", "gpa"))
-attach(tests)
-</code>
 <code>
-lm.gpa.clep <- lm(gpa ~ clep, data = tests)
+m1 <- lm(FGPA ~ HSGPA)
-summary(lm.gpa.clep)
+m2 <- lm(SATV ~ HSGPA)
+res.m1 <- resid(m1)
+# res.m1 <- m1$residuals
+res.m2 <- resid(m2)
+m.12 <- lm(res.m1 ~ res.m2)
+summary(m.12)
 </code>
-<code>Call:
-lm(formula = gpa ~ clep, data = tests)
-Residuals:
+결과는 아래에서 보는 것처럼 not significant하다.
-      Min        1Q    Median        3Q       Max
--0.190496 -0.141167 -0.002376  0.110847  0.225207
-Coefficients:
+<code>
-            Estimate Std. Error t value Pr(>|t|)
-(Intercept)  1.17438    0.38946   3.015 0.016676 *
-clep         0.06054    0.01177   5.144 0.000881 ***
----
-Signif. codes:
-‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-Residual standard error: 0.1637 on 8 degrees of freedom
+> m1 <- lm(FGPA ~ HSGPA)
-Multiple R-squared:  0.7679,	Adjusted R-squared:  0.7388
+> m2 <- lm(SATV ~ HSGPA)
-F-statistic: 26.46 on 1 and 8 DF,  p-value: 0.0008808
+> res.m1 <- resid(m1)
-</code>
+> res.m2 <- resid(m2)
+> m.12 <- lm(res.m1 ~ res.m2)
+> summary(m.12)
-''y hat = 0.06054 * clep + 1.17438''
+Call:
+lm(formula = res.m1 ~ res.m2)
+Residuals:
+    Min      1Q  Median      3Q     Max
+-0.2431 -0.1125 -0.0286  0.1269  0.2716
+Coefficients:
+             Estimate Std. Error t value Pr(>|t|)
+(Intercept) -6.50e-18   5.67e-02    0.00     1.00
+res.m2       1.51e-04   1.31e-03    0.12     0.91
+Residual standard error: 0.179 on 8 degrees of freedom
+Multiple R-squared:  0.00165,	Adjusted R-squared:  -0.123
+F-statistic: 0.0132 on 1 and 8 DF,  p-value: 0.911
-<code>
-# get residuals
-res.lm.gpa.clep <- lm.gpa.clep$residuals
 </code>
+특히 위에서 R-squared value는 0.00165이고 이를 square root 한 값은 $\sqrt{0.00165} = 0.04064$ 이다. 이 값은 HSGPA의 영향력을 IV와 (SATV) DV에서 (FGPA) 모두 제거한 후의 correlation값이라고도 할 수 있다. 사실 이 숫자는 ''lm()''말고도
-{{lm.gpa.clep.png?500}}
+<code>cor(res.m1, res.m2)
+## 혹은
+cor.test(res.m1, res.m2)</code> 으로 확인해 볼 수 있다.
 <code>
-# get cor between gpa, sat, pred, and resid from. lm.gpa.clep
+> cor(res.m1, res.m2)
-cor.gpa.clep <- as.data.frame(cbind(clep, gpa, lm.gpa.clep$fitted.values, lm.gpa.clep$residuals))
+[1] 0.04064
-colnames(cor.gpa.clep) <- c("clep", "gpa", "pred", "resid")
-cor(cor.gpa.clep)
-</code>
-<code>
-> round(cor(cor.gpa.clep),4)
-        clep    gpa   pred  resid
-clep  1.0000 0.8763 1.0000 0.0000
-gpa   0.8763 1.0000 0.8763 0.4818
-pred  1.0000 0.8763 1.0000 0.0000
-resid 0.0000 0.4818 0.0000 1.0000
 >
+> cor.test(res.m1, res.m2)
+	Pearson's product-moment correlation
-        sat   gpa  pred resid
+data:  res.m1 and res.m2
-sat   1.0000 0.7180 1.0000 0.0000
+t = 0.12, df = 8, p-value = 0.9
-gpa   0.7180 1.0000 0.7180 0.6960
+alternative hypothesis: true correlation is not equal to 0
-pred  1.0000 0.7180 1.0000 0.0000
+percent confidence interval:
-resid 0.0000 0.6960 0.0000 1.0000
+ -0.6045  0.6535
->
+sample estimates:
+    cor
+.04064
+>
 </code>
+우리는 이것을 [[:multiple_regression#determining_ivs_role|partial correlation이라고 부른다는 것을 알고 있다]]. 이를 ppcor 패키지를 이용해서 테스트해보면
-===== regression gpa against both celp and sat =====
 <code>
-lm.gpa.clepsat <- lm(gpa ~ clep + sat, data = tests)
+# install.packages("ppcor")
-summary(lm.gpa.clepsat)
+pcor.test(FGPA, SATV, HSGPA)
 </code>
+<code>> pcor.test(FGPA, SATV, HSGPA)
+  estimate p.value statistic  n gp  Method
+  0.04064  0.9173    0.1076 10  1 pearson
+> </code>
+위에서 estimate 값인 0.04064가 위의 R-square 값에 square root값을 씌운 값이 된다. 이를 그림으로 나타내 보면 아래와 같다.
+[{{:pasted:20201203-235526.png?600}}]
+반대의 경우도 실행을 해보면 즉, SATV의 영향력을 제어한 후, HSGPA의 영향력만을 볼 때
 <code>
+n1 <- lm(FGPA ~ SATV)
+n2 <- lm(HSGPA ~ SATV)
+res.n1 <- resid(n1)
+res.n2 <- resid(n2)
+n.12 <- lm(res.n1 ~ res.n2)
+summary(n.12)
+</code>
+<code>> n1 <- lm(FGPA ~ SATV)
+> n2 <- lm(HSGPA ~ SATV)
+> res.n1 <- resid(n1)
+> res.n2 <- resid(n2)
+> n.12 <- lm(res.n1 ~ res.n2)
+> summary(n.12)
 Call:
-lm(formula = gpa ~ clep + sat, data = tests)
+lm(formula = res.n1 ~ res.n2)
 Residuals:
-      Min        1Q    Median        3Q       Max
+    Min      1Q  Median      3Q     Max
--0.197888 -0.128974 -0.000528  0.131170  0.226404
+-0.2431 -0.1125 -0.0286  0.1269  0.2716
 Coefficients:
-              Estimate Std. Error t value Pr(>|t|)
+             Estimate Std. Error t value Pr(>|t|)
-(Intercept)  1.1607560  0.4081117   2.844   0.0249 *
+(Intercept) -2.04e-18   5.67e-02    0.00    1.000
-clep         0.0729294  0.0253799   2.874   0.0239 *
+res.n2       8.45e-01   2.65e-01    3.18    0.013 *
-sat         -0.0007015  0.0012564  -0.558   0.5940
 ---
-Signif. codes:
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-Residual standard error: 0.1713 on 7 degrees of freedom
+Residual standard error: 0.179 on 8 degrees of freedom
-Multiple R-squared:  0.7778,	Adjusted R-squared:  0.7143
+Multiple R-squared:  0.559,	Adjusted R-squared:  0.504
-F-statistic: 12.25 on 2 and 7 DF,  p-value: 0.005175
+F-statistic: 10.1 on 1 and 8 DF,  p-value: 0.0129
 >
 </code>
+이제는 그 R-squared 값이 0.559임을 알게 되었다. 이의 제곱근은 $\sqrt{0.559} = 0.7477$ 이고, 이것이 SATV의 영향력을 IV, DV 모두에게서 제거한 후에 IV의 (HSGPA만의) 영향력을 보는 방법이라는 것을 안다.
-''Multiple R-squared:  0.7778''
+[{{:pasted:20201203-235353.png?600}}]
-''F (2, 7) = 12.25, p = 0.005157 ''
-''intercept 1.1607560 p = 0.0249 ''
+<code>
-''clep 0.0729294  p = 0.0239''
+pcor.test(FGPA, HSGPA, SATV)
-''sat 0.0007015  p = 0.5940 ''
+</code>
+<code>> pcor.test(FGPA, HSGPA, SATV)
+  estimate p.value statistic  n gp  Method
+   0.7476 0.02057     2.978 10  1 pearson
+> </code>
-One other thing that we could do help determine a pragmatic argument is to regress GPA on both SAT and CLEP at the same time to see what happens. If we do that, we find that R-square for the model is .78, F = 12.25, p < .01. The intercept and b weight for CLEP are both significant, but the b weight for SAT is not significant. The values are
+<fc #ff0000>**위의 도식화된 분석으로 R의 multiple regression에서의 한 변인에 대한 t-test는 그 변인을 제외한 다른 IV들을 콘트롤하여 해당 IV와 DV에서 제거한 후에 본다는 사실을 알 수 있다.**</fc>  즉, ''lm(FGPA~SATV+HSGPA)'' 에서 독립변인 SATV의 t값은 HSGPA의 영향력을 제거하여 제어한 후에 살펴보고 이를 반영한다는 것을 말한다.
-  * ''Intercept = 1.16, t=2.844, p < .05''
-  * ''CLEP = 0.07, t=2.874, p < .05''
-  * ''SATQ = -.0007, t=-0.558, n.s.''
-In this case, we would conclude that the significant unique predictor is CLEP. Although SAT is highly correlated with GPA, it adds nothing to the prediction equation once the CLEP score is entered. (These data are fictional and the sample size is much too small to run this analysis. It's there for illustration only.)
-Now suppose we wanted to argue something a little different. Suppose we had a theory that said that all measures of math achievement share a common explanation, which is math ability. In other words, the reason that various (all) math achievement tests are correlated is that they share the math ability factor. In other words, math ability explains the correlation between achievement tests. In path diagram form, we might represent this something like this:
+또한 위의 설명은 [[:multiple_regression#in_r|다른 곳에서 언급했던]] Multiple regression에서의 summary(lm())과 anova(lm())이 차이를 보이는 이유를 설명하기도 한다 (여기서는 summary(mod)와 anova(mod)). anova는 변인을 순서대로 받고 다른 IV들에 대한 제어를 하지 않으므로 IV 순서에 따라서 그 분석 결과가 달라지기도 한다.
+아래의 결과를 살펴보면 anova() 결과 독립변인들의 p value 들과 summary() 에서의 독립변인들의 p value가 다른 이유가 다르다.
-===== checking partial cor 1 =====
 <code>
-# get res.lm.clep.sat
+# anova()에서의 결과
-lm.sat.clep <- lm(sat ~ clep, data = tests)
+acs_k3      1  110211  110211   32.059 2.985e-08 ***
-summary(lm.sat.clep)
+# summary(lm())에서의 결과
+acs_k3        3.3884     2.3333   1.452    0.147
 </code>
+아래는 [[:multiple_regression#in_r|Multiple Regression 설명에서 가져옴]]
 <code>
+dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv", fileEncoding="UTF-8-BOM")
+mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar)
+summary(mod)
+anova(mod)
+</code>
+<code>
+> dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv", fileEncoding="UTF-8-BOM")
+> mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar)
+> summary(mod)
 Call:
-lm(formula = sat ~ clep, data = tests)
+lm(formula = api00 ~ ell + acs_k3 + avg_ed + meals, data = dvar)
 Residuals:
      Min       1Q   Median       3Q      Max
--101.860  -19.292    1.136   28.306   54.132
+-187.020  -40.358   -0.313   36.155  173.697
 Coefficients:
             Estimate Std. Error t value Pr(>|t|)
-(Intercept)  -19.421    114.638  -0.169  0.86967
+(Intercept) 709.6388    56.2401  12.618  < 2e-16 ***
-clep          17.665      3.464   5.100  0.00093 ***
+ell          -0.8434     0.1958  -4.307 2.12e-05 ***
+acs_k3        3.3884     2.3333   1.452    0.147
+avg_ed       29.0724     6.9243   4.199 3.36e-05 ***
+meals        -2.9374     0.1948 -15.081  < 2e-16 ***
 ---
-Signif. codes:
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-Residual standard error: 48.2 on 8 degrees of freedom
+Residual standard error: 58.63 on 374 degrees of freedom
-Multiple R-squared:  0.7648,	Adjusted R-squared:  0.7353
+  (21 observations deleted due to missingness)
-F-statistic: 26.01 on 1 and 8 DF,  p-value: 0.0009303
+Multiple R-squared:  0.8326,	Adjusted R-squared:  0.8308
+F-statistic:   465 on 4 and 374 DF,  p-value: < 2.2e-16
-> </code>
+> anova(mod)
-{{lm.sat.clep.png?400}}
+Analysis of Variance Table
-<code>
-res.lm.sat.clep <- lm.sat.clep$residuals
-</code>
-<code>
+Response: api00
-install.packages("ppcor")
+           Df  Sum Sq Mean Sq  F value    Pr(>F)
-library(ppcor)
+ell         1 4502711 4502711 1309.762 < 2.2e-16 ***
+acs_k3      1  110211  110211   32.059 2.985e-08 ***
+avg_ed      1  998892  998892  290.561 < 2.2e-16 ***
+meals       1  781905  781905  227.443 < 2.2e-16 ***
+Residuals 374 1285740    3438
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+>
 </code>
-<code>
-pcor.gpa.sat.clep <- lm(res.lm.gpa.clep ~ res.lm.sat.clep)
+아래에서 mod2를 보면
-summary(pcor.gpa.sat.clep)
+  * api00이 종속변인이고,
-</code>
+  * ell, avg_ed, meals + acs_k3 가 독립변인인데,
-<code>
+  * 그 순서가 이전 문서의
+  * ell, acs_k3, avg_ed, meals에서 바뀐것을 알 수 있다 (acs.k3가 맨 뒤로 감).
+  * 즉,
+  * lm(api00 ~ ell + acs_k3 + avg_ed + meals)
+  * lm(api00 ~ ell + avg_ed + meals + acs_k3)
+anova는 독립변인에 대한 영향력을 다른 IV들을 고려하지 않고, 그냥 입력 순서대로 처리하므로, acs_k3를 마지막으로 보냄으로써, 다른 IV들이 DV에 대한 설명력을 모두 차지하고 그 나머지를 보여주게 된다.
+<code>> mod2 <- lm(api00 ~ ell + avg_ed + meals + acs_k3, data=dvar)
+> summary(mod2)
 Call:
-lm(formula = res.lm.gpa.clep ~ res.lm.sat.clep)
+lm(formula = api00 ~ ell + avg_ed + meals + acs_k3, data = dvar)
 Residuals:
-      Min        1Q    Median        3Q       Max
+    Min      1Q  Median      3Q     Max
--0.197888 -0.128974 -0.000528  0.131170  0.226404
+-186.90  -40.13   -0.07   35.96  174.12
 Coefficients:
-                  Estimate Std. Error t value Pr(>|t|)
+            Estimate Std. Error t value Pr(>|t|)
-(Intercept)      1.755e-17  5.067e-02   0.000    1.000
+(Intercept)  711.681     56.468   12.60  < 2e-16 ***
-res.lm.sat.clep -7.015e-04  1.175e-03  -0.597    0.567
+ell           -0.845      0.197   -4.28  2.4e-05 ***
+avg_ed        28.966      6.947    4.17  3.8e-05 ***
+meals         -2.948      0.196  -15.02  < 2e-16 ***
+acs_k3         3.336      2.340    1.43     0.15
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-Residual standard error: 0.1602 on 8 degrees of freedom
+Residual standard error: 58.8 on 371 degrees of freedom
-Multiple R-squared:  0.04264,	Adjusted R-squared:  -0.07703
+Multiple R-squared:  0.832,	Adjusted R-squared:  0.831
-F-statistic: 0.3563 on 1 and 8 DF,  p-value: 0.5671
+F-statistic:  461 on 4 and 371 DF,  p-value: <2e-16
-</code>
-{{pcor.gpa.sat.clep.png?500}}
-<code>
-> pcor.gpa.sat.clep <- pcor.test(gpa,sat,clep)
-> pcor.gpa.sat.clep
-    estimate   p.value statistic  n gp  Method
--0.2064849 0.5940128  -0.55834 10  1 pearson
-> pcor.gpa.sat.clep$estimate^2
-[1] 0.04263601
 >
+> anova(mod2)
+Analysis of Variance Table
+Response: api00
+           Df  Sum Sq Mean Sq  F value Pr(>F)
+ell         1 4502711 4502711 1309.762 <2e-16 ***
+avg_ed      1 1017041 1017041  295.840 <2e-16 ***
+meals       1  866716  866716  252.113 <2e-16 ***
+acs_k3      1    7250    7250    2.109 0.1473
+Residuals 374 1285740    3438
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 </code>
-<code>d <- data.frame(sat=sat, clep=clep, gpa=gpa, res.lm.gpa.clep=res.lm.gpa.clep)
+이는 다른 독립변인들의 순서를 바꾸어도 마찬가지이다. mod3은 mod2에서 meals변인을 맨 앞으로 옮긴 예이다. 즉
-plot(d)</code>
-{{pcor.sat.clep.gpa.res.png}}
+  * mod  <- lm(api00 ~ ell + acs_k3 + avg_ed + meals)
+  * mod2 <- lm(api00 ~ ell + avg_ed + meals + acs_k3)
+  * mod3 <- lm(api00 ~ meals + ell + avg_ed + acs_k3)
-Note that the relationship between res.lm.gpa.clep and sat look like negative, which can be confirmed in the lm.gpa.satclep <code>summary(lm.gpa.satclep)</code>.
+summary(mod), summary(mod2), summary(mod3)의 결과는 서로 다르지 않지만, anova의 결과는 어떤 독립변인이 앞으로 오는가에 따라서 그 f값과  p-value가 달라진다. 물론, 만약에 독립변인들 간의 상관관계가 0이라면 순서가 영향을 주지는 않겠다.
-===== checking partial cor 2 =====
 <code>
-> # import test score data "tests_cor.csv"
+> mod3 <- lm(api00 ~ meals + ell + avg_ed + acs_k3, data=dvar)
-> tests <- read.csv("http://commres.net/wiki/_media/r/tests_cor.csv")
+> summary(mod3)
-> colnames(tests) <- c("ser", "sat", "clep", "gpa")
-> tests <- subset(tests, select=c("sat", "clep", "gpa"))
-> attach(tests)
-> cor(tests)
-           sat      clep       gpa
-sat  1.0000000 0.8745001 0.7180459
-clep 0.8745001 1.0000000 0.8762720
-gpa  0.7180459 0.8762720 1.0000000
-</code>
-$$
+Call:
-r_{12.3} = \frac {r_{12} - r_{13} r_{23} } {\sqrt{1-r_{13}^2} \sqrt{1-r_{23}^2}}
+lm(formula = api00 ~ meals + ell + avg_ed + acs_k3, data = dvar)
-$$
-(1 = GPA, 2 = CLEP, 3 = SAT)
-\begin{eqnarray*}
+Residuals:
-r_{\text{gpaclep.sat}} & = & \frac {r_{\text{gpaclep}} - r_{\text{gpasat}} r_{\text{clepsat}} } {\sqrt{1-r_{\text{gpasat}}^2} \sqrt{1-r_{\text{clepsat}}^2}} \\
+    Min      1Q  Median      3Q     Max
-& = & \frac {0.8762720 - (0.7180459)(0.8745001)}{\sqrt{1-0.7180459^2} \sqrt{1-0.8745001^2}} \\
+-186.90  -40.13   -0.07   35.96  174.12
-& = & .73
-\end{eqnarray*}
-$$
+Coefficients:
-r_{12.3} = \frac {r_{12} - r_{13} r_{23} } {\sqrt{1-r_{13}^2} \sqrt{1-r_{23}^2}}
+            Estimate Std. Error t value Pr(>|t|)
-$$
+(Intercept)  711.681     56.468   12.60  < 2e-16 ***
-(1 = gpa, 2 = sat, 3 = clep)
+meals         -2.948      0.196  -15.02  < 2e-16 ***
+ell           -0.845      0.197   -4.28  2.4e-05 ***
+avg_ed        28.966      6.947    4.17  3.8e-05 ***
+acs_k3         3.336      2.340    1.43     0.15
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-\begin{eqnarray*}
+Residual standard error: 58.8 on 371 degrees of freedom
-r_{\text{gpasat.clep}} & = & \frac {r_{\text{gpasat}} - r_{\text{gpaclep}} r_{\text{satclep}} } {\sqrt{1-r_{\text{gpaclep}}^2} \sqrt{1-r_{\text{satclep}}^2}} \\
+Multiple R-squared:  0.832,	Adjusted R-squared:  0.831
-& = & \frac {0.7180459 - (0.8762720)(0.8745001)}{\sqrt{1-0.8762720^2} \sqrt{1-0.8745001^2}} \\
+F-statistic:  461 on 4 and 371 DF,  p-value: <2e-16
-& = & 0.04263585
-\end{eqnarray*}
-<code>> cor(tests)
+> anova(mod2)
-           sat      clep       gpa
+Analysis of Variance Table
-sat  1.0000000 0.8745001 0.7180459
-clep 0.8745001 1.0000000 0.8762720
+Response: api00
-gpa  0.7180459 0.8762720 1.0000000
+           Df  Sum Sq Mean Sq F value Pr(>F)
-> round(cor(tests),4)
+ell         1 4480281 4480281 1297.34 <2e-16 ***
-        sat   clep    gpa
+avg_ed      1 1014175 1014175  293.67 <2e-16 ***
-sat  1.0000 0.8745 0.7180
+meals       1  864080  864080  250.21 <2e-16 ***
-clep 0.8745 1.0000 0.8763
+acs_k3      1    7021    7021    2.03   0.15
-gpa  0.7180 0.8763 1.0000
+Residuals 371 1281224    3453
-> c<- (.7180459-(.8762720*.8745001))
+---
-> d <- (sqrt(1-.8762720^2) * sqrt(1-.8745001^2))
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-> c/d
+>
-[1] -0.2064845
+>
-> (c/d)^2
+> anova(mod3)
-[1] 0.04263585
+Analysis of Variance Table
+Response: api00
+           Df  Sum Sq Mean Sq F value  Pr(>F)
+meals       1 6219897 6219897 1801.08 < 2e-16 ***
+ell         1   82758   82758   23.96 1.5e-06 ***
+avg_ed      1   55880   55880   16.18 7.0e-05 ***
+acs_k3      1    7021    7021    2.03    0.15
+Residuals 371 1281224    3453
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 > </code>
-====== Semipartial cor ======
-See also [[:sequential_regression#enter]] page.
-<code>> tests <- read.csv("http://commres.net/wiki/_media/r/tests_cor.csv")
+====== e.g. Using ppcor.test with 4 var ======
-> colnames(tests) <- c("ser", "sat", "clep", "gpa")
-> tests <- subset(tests, select=c("sat", "clep", "gpa"))
-> attach(tests)
-> cors <- cor(tests)
-> round(cors,3)
-       sat  clep   gpa
-sat  1.000 0.875 0.718
-clep 0.875 1.000 0.876
-gpa  0.718 0.876 1.000
-> lm.sat.clep <- lm(sat ~ clep, data = tests)
-> summary(lm.sat.clep)
-Call:
+<code>
-lm(formula = sat ~ clep, data = tests)
+options(digits = 4)
-Residuals:
+HSGPA <- c(3.0, 3.2, 2.8, 2.5, 3.2, 3.8, 3.9, 3.8, 3.5, 3.1)
-     Min       1Q   Median       3Q      Max
+FGPA <-  c(2.8, 3.0, 2.8, 2.2, 3.3, 3.3, 3.5, 3.7, 3.4, 2.9)
--101.860  -19.292    1.136   28.306   54.132
+SATV <-  c(500, 550, 450, 400, 600, 650, 700, 550, 650, 550)
+GREV <-  c(600, 670, 540, 800, 750, 820, 830, 670, 690, 600)
+##GREV <- c(510, 670, 440, 800, 750, 420, 830, 470, 690, 600)
-Coefficients:
-            Estimate Std. Error t value Pr(>|t|)
-(Intercept)  -19.421    114.638  -0.169  0.86967
-clep          17.665      3.464   5.100  0.00093 ***
----
-Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-Residual standard error: 48.2 on 8 degrees of freedom
+scholar <- data.frame(HSGPA, FGPA, SATV, GREV) # collect into a data frame
-Multiple R-squared:  0.7648,	Adjusted R-squared:  0.7353
+# install.packages("psych")
-F-statistic: 26.01 on 1 and 8 DF,  p-value: 0.0009303
+library(psych)
+describe(scholar) # provides descrptive information about each variable
-> res.lm.sat.clep <- lm.sat.clep$residuals
+corrs <- cor(scholar) # find the correlations and set them into an object called 'corrs'
->
+corrs                 # print corrs
-> install.packages("ppcor")
-> library(ppcor)
-Loading required package: MASS
-> # regression test for semipartial correlation (holding clep constant)
+pairs(scholar)        # pairwise scatterplots
-> spcor.gpa.sat.clep <- lm(gpa ~ res.lm.sat.clep)
-> summary(spcor.gpa.sat.clep)
-Call:
+# install.packages("ppcor")
-lm(formula = gpa ~ res.lm.sat.clep)
+library(ppcor)
+pcor.test(scholar$GREV, scholar$FGPA, scholar[,c("SATV", "HSGPA")]) # working
-Residuals:
+reg3 <- lm(GREV ~ SATV + HSGPA)   # run linear regression
-    Min      1Q  Median      3Q     Max
+resid3 <- resid(reg3)     # find the residuals - GREV free of SATV and HSGPA
--0.3756 -0.2694 -0.0092  0.2514  0.4686
-Coefficients:
+reg4 <- lm(FGPA ~ SATV + HSGPA)   # second regression
-                  Estimate Std. Error t value Pr(>|t|)
+resid4 <- resid(reg4)     # second set of residuals - FGPA free of SATV and HSGPA
-(Intercept)      3.1600000  0.1069377  29.550 1.86e-09 ***
-res.lm.sat.clep -0.0007015  0.0024806  -0.283    0.785
----
-Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-Residual standard error: 0.3382 on 8 degrees of freedom
+cor(resid3, resid4)       # correlation of residuals - partial correlation
-Multiple R-squared:  0.009898,	Adjusted R-squared:  -0.1139
-F-statistic: 0.07997 on 1 and 8 DF,  p-value: 0.7845
 </code>
-From the above: Multiple R-squared: 0.009898
+<code>
-From the below: spcor.gpa.sat.clep%%$%%estimate^2: 0.009897835
+> pcor.test(scholar$GREV, scholar$FGPA, scholar[,c("SATV", "HSGPA")]) # working
+  estimate p.value statistic  n gp  Method
+   -0.535  0.1719    -1.551 10  2 pearson
+>
+> reg3 <- lm(GREV ~ SATV + HSGPA)   # run linear regression
+> resid3 <- resid(reg3)     # find the residuals - GREV free of SATV and HSGPA
+>
+> reg4 <- lm(FGPA ~ SATV + HSGPA)   # second regression
+> resid4 <- resid(reg4)     # second set of residuals - FGPA free of SATV and HSGPA
+>
+> cor(resid3, resid4)       # correlation of residuals - partial correlation
+[1] -0.535
-<code>> spcor.gpa.sat.clep <- spcor.test(gpa,sat,clep)
-> spcor.gpa.sat.clep
-     estimate   p.value  statistic  n gp  Method
--0.09948786 0.7989893 -0.2645326 10  1 pearson
-> spcor.gpa.sat.clep$estimate^2
-[1] 0.009897835
-> </code>
+</code>
+----
+----
+학자인 A는 GRE점수는 (GREV) 학점에 신경을 쓰는 활동보다는 지능지수와 관련된다고 믿는 SATV의 영향력이 더 클것으로 생각된다. 그래서 SATV만의 영향력을 다른 변인을 콘트롤하여 살펴보고 싶다.
+<code>
+pcor.test(scholar$GREV, scholar$SATV, scholar[, c("HSGPA", "FGPA")])
+reg7 <- lm(GREV ~ HSGPA + FGPA)   # run linear regression
+resid7 <- resid(reg7)     # find the residuals - HSGPA free of SATV
+reg8 <- lm(SATV ~ HSGPA+ FGPA)   # second regression
+resid8 <- resid(reg8)     # second set of residuals - FGPA free of SATV
+cor(resid7, resid8)       # correlation of residuals - partial correlation
+</code>
+<code>
+> pcor.test(scholar$GREV, scholar$SATV, scholar[, c("HSGPA", "FGPA")])
+  estimate p.value statistic  n gp  Method
+   0.3179  0.4429    0.8213 10  2 pearson
+>
+> reg7 <- lm(GREV ~ HSGPA + FGPA)   # run linear regression
+> resid7 <- resid(reg7)     # find the residuals - HSGPA free of SATV
+>
+> reg8 <- lm(SATV ~ HSGPA+ FGPA)   # second regression
+> resid8 <- resid(reg8)     # second set of residuals - FGPA free of SATV
+>
+> cor(resid7, resid8)       # correlation of residuals - partial correlation
+[1] 0.3179
+>
+>
+</code>
-====== e.g., ======
+====== e.g., 독립변인 들이 서로 독립적일 때의 각각의 설명력 ======
 In this example, the two IVs are orthogonal to each other (not correlated with each other). Hence, regress res.y.x2 against x1 would not result in any problem.
 <code>
@@ Line 526: / Line 573: @@
 F-statistic: 5.61e+03 on 2 and 29 DF,  p-value: <2e-16
 > </code>
-....
-Std. Error for x1
-SSres/n-2 =
-Std. Error for x2
-....
 <code>
 lm.y.x2 <- lm(y ~ x2)
@@ Line 559: / Line 602: @@
 </code>
 <code>
-d <- data.frame(X1=x1, X2=x2, Y=y, res.lm.y.x2=res.lm.y.x2)
+d <- data.frame(X1=x1, X2=x2, Y=y, RY=res.lm.y.x2)
 plot(d)
 </code>
-{{partial.r.regression.png}}
+{{:pasted:20230524-000614.png}}
 <code>> 1-0.9927
-[1] 0.0073</code>
+[1] 0.0073
+</code>
 X2이 설명하는 Y분산의 나머지를 (1-R<sup>2</sup> = 0.0073) 종속변인으로 하고 x1을 독립변인으로 하여 regression을 하면 figure의 RY축에 해당하는 관계가 나타난다. 특히 RY와 X1과의 관계가 선형적으로 바뀐것은 X1 자체로는 아무런 역할을 하지 못하는 것으로 나타나다가, X2가 개입되고 X2의 영향력으로 설명된 Y부분을 제외한 (제어한, controlling) 나머지에 대한 X1의 설명력이 significant하게 바뀐 결과이다.
@@ Line 600: / Line 645: @@
 x2의 영향력을 control한 후에 x1영향력을 보면 64.54%에 달하게 된다.
+====== X1과 X2 간의 상관관계가 심할 때 Regression 결과의 오류 ======
+see https://www.researchgate.net/post/Why_is_the_Multiple_regression_model_not_significant_while_simple_regression_for_the_same_variables_is_significant
+<code>
+RSS = 3:10 # 오른 발 사이즈
+LSS = rnorm(RSS, RSS, 0.1) # 왼발 사이즈 - 오른 발과 매우 유사
+cor(LSS, RSS) # correlation ~ 0.99
+weights = 120 + rnorm(RSS, 10*RSS, 10)
+# Fit a joint model
+m <- lm(weights ~ LSS + RSS)
+## F-value is very small, but neither LSS or RSS are significant
+summary(m)
+## Fitting RSS or LSS separately gives a significant result.
+summary(lm(weights ~ LSS))
+## or
+summary(lm(weights ~ RSS))
-====== e.g. ======
-<code csv eg.b.csv>
-y	x1	x2
-.644540	1.063845	.351188
-.785204	1.203146	.200000
--1.36357	-.466514	-.961069
-.314549	1.175054	.800000
-.317955	.100612	.858597
-.970097	2.438904	1.000000
-.664388	1.204048	.292670
--.870252	-.993857	-1.89018
-.962192	.587540	-.275352
-.036381	-.110834	-.246448
-.007415	-.069234	1.447422
-.634353	.965370	.467095
-.219813	.553268	.348095
--.285774	.358621	.166708
-.498758	-2.87971	-1.13757
-.671538	-.310708	.396034
-.462036	.057677	1.401522
--.563266	.904716	-.744522
-.297874	.561898	-.929709
--1.54898	-.898084	-.838295
 </code>
-<code csv eg.c.csv>
-y	x1	x2
+<code>> RSS = 3:10 #Right shoe size
-.644540	1.063845	.351188
+> LSS = rnorm(RSS, RSS, 0.1) #Left shoe size - similar to RSS
-.785204	-1.20315	.200000
+> cor(LSS, RSS) #correlation ~ 0.99
--1.36357	-.466514	-.961069
+[1] 0.9994836
-.314549	1.175054	.800000
+>
-.317955	-.100612	.858597
+> weights = 120 + rnorm(RSS, 10*RSS, 10)
-.970097	1.438904	1.000000
+>
-.664388	1.204048	.292670
+> ##Fit a joint model
--.870252	-.993857	-1.89018
+> m = lm(weights ~ LSS + RSS)
-.962192	-.587540	-.275352
+>
-.036381	-.110834	-.246448
+> ##F-value is very small, but neither LSS or RSS are significant
-.007415	-.069234	1.447422
+> summary(m)
-.634353	.965370	.467095
-.219813	.553268	.348095
+Call:
--.285774	.358621	.166708
+lm(formula = weights ~ LSS + RSS)
-.498758	-2.87971	-1.13757
-.671538	-.810708	.396034
+Residuals:
-.462036	-.057677	1.401522
+       2       3       4       5       6       7       8
--.563266	.904716	-.744522
+.8544  4.5254 -3.6333 -7.6402 -0.2467 -3.1997 -5.2665 10.6066
-.297874	.561898	-.929709
--1.54898	-1.26108	-.838295
+Coefficients:
+            Estimate Std. Error t value Pr(>|t|)
+(Intercept)  104.842      8.169  12.834 5.11e-05 ***
+LSS          -14.162     35.447  -0.400    0.706
+RSS           26.305     35.034   0.751    0.487
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+Residual standard error: 7.296 on 5 degrees of freedom
+Multiple R-squared:  0.9599,	Adjusted R-squared:  0.9439
+F-statistic: 59.92 on 2 and 5 DF,  p-value: 0.000321
+>
+> ##Fitting RSS or LSS separately gives a significant result.
+> summary(lm(weights ~ LSS))
+Call:
+lm(formula = weights ~ LSS)
+Residuals:
+   Min     1Q Median     3Q    Max
+-6.055 -4.930 -2.925  4.886 11.854
+Coefficients:
+            Estimate Std. Error t value Pr(>|t|)
+(Intercept)  103.099      7.543   13.67 9.53e-06 ***
+LSS           12.440      1.097   11.34 2.81e-05 ***
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+Residual standard error: 7.026 on 6 degrees of freedom
+Multiple R-squared:  0.9554,	Adjusted R-squared:  0.948
+F-statistic: 128.6 on 1 and 6 DF,  p-value: 2.814e-05
+>
+>
+> ## or
+> summary(lm(weights ~ RSS))
+Call:
+lm(formula = weights ~ RSS)
+Residuals:
+   Min     1Q Median     3Q    Max
+-13.46  -4.44   1.61   4.53   9.51
+Coefficients:
+            Estimate Std. Error t value Pr(>|t|)
+(Intercept)   125.92       8.76   14.38  7.1e-06 ***
+RSS             9.33       1.27    7.34  0.00033 ***
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+Residual standard error: 8.24 on 6 degrees of freedom
+Multiple R-squared:   0.9,	Adjusted R-squared:  0.883
+F-statistic: 53.9 on 1 and 6 DF,  p-value: 0.000327
+>
 </code>
-====== e.g. 3, ======
+====== e. g. API00 data  using ppcor package ======
+This part is an extension of the [[:multiple regression#in_r|R example in Multiple Regression]].
+We are going to use spcor for identifying the effect of each IV. In order to do that, we need to reconstruct the data with the involved variable.
 <code>
-set.seed(888)  # for reproducibility
+dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv", fileEncoding="UTF-8-BOM")
-S = rnorm(60, mean=0, sd=1.0)  # the Suppressor is normally distributed
+mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar)
-U = 1.1*S + rnorm(60, mean=0, sd=0.1)  # U (unrelated) is Suppressor plus error
+summary(mod)
-R <- rnorm(60, mean=0, sd=1.0)  # related part; normally distributed
+anova(mod)
-OV = U + R  # the Other Variable is U plus R
-Y  = R + rnorm(60, mean=0, sd=2)  # Y is R plus error
+attach(dvar)
+da1 <- data.frame(api00, ell, acs_k3, avg_ed, meals)
+da2 <- data.frame(api00, ell, avg_ed, meals)
+da1 <- na.omit(da1)
+da2 <- na.omit(da2)
+spcor(da1)
+spcor(da2)
 </code>
+<code>
+> spcor(da1)
+$estimate
+             api00         ell      acs_k3      avg_ed      meals
+api00   1.00000000 -0.09112026  0.03072660  0.08883450 -0.3190889
+ell    -0.13469956  1.00000000  0.06086724 -0.06173591  0.1626061
+acs_k3  0.07245527  0.09709299  1.00000000 -0.13288465 -0.1367842
+avg_ed  0.12079565 -0.05678795 -0.07662825  1.00000000 -0.2028836
+meals  -0.29972194  0.10332189 -0.05448629 -0.14014709  1.0000000
+$p.value
+              api00        ell    acs_k3      avg_ed        meals
+api00  0.000000e+00 0.07761805 0.5525340 0.085390280 2.403284e-10
+ell    8.918743e-03 0.00000000 0.2390272 0.232377348 1.558141e-03
+acs_k3 1.608778e-01 0.05998819 0.0000000 0.009891503 7.907183e-03
+avg_ed 1.912418e-02 0.27203887 0.1380449 0.000000000 7.424903e-05
+meals  3.041658e-09 0.04526574 0.2919775 0.006489783 0.000000e+00
+$statistic
+           api00       ell     acs_k3    avg_ed     meals
+api00   0.000000 -1.769543  0.5945048  1.724797 -6.511264
+ell    -2.628924  0.000000  1.1793030 -1.196197  3.187069
+acs_k3  1.404911  1.886603  0.0000000 -2.592862 -2.670380
+avg_ed  2.353309 -1.100002 -1.4862899  0.000000 -4.006914
+meals  -6.075665  2.008902 -1.0552823 -2.737331  0.000000
+$n
+[1] 379
+$gp
+[1] 3
+$method
+[1] "pearson"
+> spcor(da2)
+$estimate
+            api00         ell      avg_ed      meals
+api00   1.0000000 -0.08988307  0.08485896 -0.3350062
+ell    -0.1331295  1.00000000 -0.07018696  0.1557097
+avg_ed  0.1164280 -0.06501592  1.00000000 -0.1967128
+meals  -0.3170300  0.09948710 -0.13568145  1.0000000
+$p.value
+              api00        ell      avg_ed        meals
+api00  0.000000e+00 0.08053412 0.099035570 2.160270e-11
+ell    9.465897e-03 0.00000000 0.172705169 2.366624e-03
+avg_ed 2.340012e-02 0.20663211 0.000000000 1.158869e-04
+meals  2.698723e-10 0.05296417 0.008170237 0.000000e+00
+$statistic
+           api00       ell    avg_ed     meals
+api00   0.000000 -1.752306  1.653628 -6.903560
+ell    -2.608123  0.000000 -1.366153  3.060666
+avg_ed  2.276102 -1.265057  0.000000 -3.895587
+meals  -6.490414  1.941321 -2.659047  0.000000
+$n
+[1] 381
+$gp
+[1] 2
+$method
+[1] "pearson"
+>
+</code>
+====== e.g. mpg model in mtcars (in r) ======
+{{:pasted:20201203-182515.png?700}}