Mediation Analysis

Planned behavior 이론에 따라서 연구자는 아래와 같은 모델을 만들고 데이터를 얻은 후 테스트하려고 한다. 특히 이 단계에서 연구자는 Attitudes가 Behavior 미치는 영향력을 Attitudes 고유의 것과 Intention을 거쳐가는 것으로 구분하여 확인해보려고 한다. 이와 같은 통계검증 방식을 mediation analysis라고 하는데, 이 방식은 아래와 같은 상황을 전제로 한다.

우선 attitudes와 (x 변인) behavior (y 변인) 간의 significant한 관계가 있다. 즉, lm(behavior ~ attitudes) 의 f statistics 값이 significant하다. 이 coefficient값을 c 라고 하자.

Attitudes  ---------c----------   Behavior

이 attitudes는 intention에도 통계적으로 유의미한 설명력을 갖는다. 이 관계는 a 라는 coefficient값을 갖는다.

Attitudes  ---------a----------   Intention

그리고 beahvior에 대해서 attitudes와 intention 두 변인을 이용한 regression의 R square값은 significant한 설명력을 갖는다. 그러나, 각 변인 중 애초에 독립적으로 사용되었을 때 significant했던 attitudes는 설명력을 잃거나 혹은 significant한 설명력을 갖지만 그 크기가 c보다 작다. 즉, 아래에서
b = significant
c' = not significant 혹은 c' < c 즉, attitudes의 설명력이 significant한 상태를 유지하지만 영향력은 감소

               Intention     
                          \  
                           \ b
                            \ 
Attitudes  --------c'-----==  Behavior

b      
c' < c

이 때 연구자는 attitudes의 설명력이 Intentin을 통해서 behavior에 가게 되는게 아닌가 가정을 하고 이를 확인하게 되는데, 이것이 attitudes의 intention을 통한 moderating 혹은 mediated effect이다.

attitudes가 intention을 통해서 어떤 크기의 영향력을 갖는가? 즉, attitudes의 매개효과의 크기는 어느정도인가가 궁금할 수 있는데 크게는 두 가지가 있다. 첫번째는 c - c' 의 크기를 indirect effect로 보는 경우이다. c는 attitudes가 온전히 (다른 변인 고려 없이) behavior에 미치는 영향력인데, 다른 변인이 존재하므로써 이 크기가 줄게 되고 (c'으로), 원래 크기에서 © 줄어든 크기를 (c') 뺀 값이 indirect effect이다.

다른 하나는 세 변인을 동시에 고려하여 attitudes의 단위변화가 최종적으로 behavior에 어떻게 영향을 미치는가를 본다. 만약에 a의 beta 크기가 2라고 하고, b의 beta 크기가 3이라고 하면, 우리는 attitudes의 단위가 한단위 변하면 intention의 단위가 2단위 변한다는 것을 알고, 이를 두번째 단계에 (b의 크기) 적용하면 2단위가 변하였으므로 behavior는 3×2단위 (6단위) 변하게 될 것이라고 추측할 수 있다. 즉, attitudes의 한단위는 intention의 매개효과를 통하여 behavior가 6단위 변하도록 한다 (a beta coefficient * b beta coefficient).

size of mediated effect
ab:=a*b 의 크기는 이것을 sdx/sdy 로 곱하여 beta coefficient값으로 변하여 구해본다. . . .

이런 것들은 일반 regression을 이용하여 알아낼 수도 있지만, 보통 lavaan package를 활용하거나 (path analysis를 위해서 개발) 혹은 mediation이라는 package를 이용하여 독립변인의 매개효과를 알아낸다.

                Intention 
             /            \
        a   /              \  b
           /                \    
Attitudes  --------c'---------   Behavior

위의 그림에서 attitudes 가 behavior 미치는 영향력을 볼 때

c' = 0 인 경우가 있다. 이 경우에 attitudes는 behavior에 대한 직접적인 영향력은 없고 오로지 a 와 b의 path만을 거쳐서 영향력을 (설명력을) 갖게 된다. 이 경우를 완전매개라고 (complete mediation) 한다
반면에 c' ~= 0 인 경우가 있다. 이 때는 c에도 부분적으로 설명력이 걸리고, a b를 통해서도 부분적으로 설명력이 걸리는 경우이므로 부분매개라고 (partial mediation) 부른다.

또한 mediation analysis에서 독립변인들의 효과를 (설명력을) 직접효과와 간접효과로 나눌 수 있는데 직접효과는 a, b, 그리고 c'를 직접효과라고 (direct effects) 하고 a와 b를 거쳐서 가는 효과를 간접효과라고 (indirect effects) 한다. Indirect effects 의 크기를 어떻게 측정하는가에는 여러가지 방법이 있을 수 있지만, 가장 많이 쓰이는 방법으로는

a path와 b path의 coefficient값을 곱한 값을 취하는 방법이다
다른 방법으로는 c - c' 값을 취하는 방법이 있지만 흔하지는 않다

위에서 a b 를 곱해서 간접효과를 측정할 때에 그 값이 (효과가) significant한지 알아보기 위한 테스트에는 두 가지 방법이 있을 수 있는데

Delta method와 Resampling method 이다
전자는 indirect coefficient의 (곱한 값의) sampling distribution이 normal distribution을 이룬다는 전제하에 구하는 방법인데, 이런 경우가 드문 편이라 현실적이지 않은 면이 있다. 또한 샘플사이즈가 작을 때에는 부정확한 면이 있다
후자인 resampling method는 샘플링을 반복해서 indirect coefficient 값을 얻은 후에 이를 가지고 효과를 측정하는 경우이다.
- Percentile bootstrapping
- bias-corrected bootstrapping
- permutation method
- monte carlo method 등을 사용한다.

######################################################
## data file: PlannedBehavior.csv
######################################################
######################################################
install.packages("readr")
library(readr)
df <- read.csv("http://commres.net/wiki/_media/r/plannedbehavior.csv")
head(df)
str(df)
# attitude
# norms
# control
# intention
# behavior
######################################################
######################################################
######################################################
# Mediation Analysis using lavaan 
# in R (path analysis framework, SEM)
######################################################
######################################################
######################################################

# specifying path analysis model 
# by using lavann package
install.packages("lavann")
library(lavaan)

specmod <- "
    # path c
    # identifying path c (prime) by putting c*
    behavior ~ c*attitude

    # path a
    intention ~ a*attitude

    # path b 
    behavior ~ b*intention
    
    # indirect effect (a*b): Sobel test (Delta Method)
    # 간접효과 a path x b path 를 구해서 얻음
    # sobel test 라 부름
    ab := a*b
"
# Fit/estimate the model
fitmod <- sem(specmod, data=df)
# summarize the model
summary(fitmod, fit.measures=TRUE, rsquare=T)

##########################################
# boot strapping instead of sobel test 
##########################################
set.seed(101)
fitmod2 <- sem(specmod, data=df, 
               se="bootstrap", 
               bootstrap=100)
# bootstrap = 5000 is common

summary(fitmod2, fit.measures=TRUE, rsquare=TRUE)
parameterEstimates(fitmod2, 
                   ci=TRUE, level=.95, 
                   boot.ci.type="perc")

output

 
> ######################################################
> ## data file: PlannedBehavior.csv
> ######################################################
> ######################################################
> install.packages("readr")
Error in install.packages : Updating loaded packages
> library(readr)
> df <- read.csv("http://commres.net/wiki/_media/r/plannedbehavior.csv")
> head(df)
  attitude norms control intention behavior
1     2.31  2.31    2.03      2.50     2.62
2     4.66  4.01    3.63      3.99     3.64
3     3.85  3.56    4.20      4.35     3.83
4     4.24  2.25    2.84      1.51     2.25
5     2.91  3.31    2.40      1.45     2.00
6     2.99  2.51    2.95      2.59     2.20
> str(df)
'data.frame':	199 obs. of  5 variables:
 $ attitude : num  2.31 4.66 3.85 4.24 2.91 2.99 3.96 3.01 4.77 3.67 ...
 $ norms    : num  2.31 4.01 3.56 2.25 3.31 2.51 4.65 2.98 3.09 3.63 ...
 $ control  : num  2.03 3.63 4.2 2.84 2.4 2.95 3.77 1.9 3.83 5 ...
 $ intention: num  2.5 3.99 4.35 1.51 1.45 2.59 4.08 2.58 4.87 3.09 ...
 $ behavior : num  2.62 3.64 3.83 2.25 2 2.2 4.41 4.15 4.35 3.95 ...
> # attitude
> # norms
> # control
> # intention
> # behavior
> ######################################################
> ######################################################
> ######################################################
> # Mediation Analysis using lavaan 
> # in R (path analysis framework, SEM)
> ######################################################
> ######################################################
> ######################################################
> 
> # specifying path analysis model 
> # by using lavann package
> install.packages("lavann")
‘C:/Users/Hyo/Documents/R/win-library/4.1’의 위치에 패키지(들)을 설치합니다.
(왜냐하면 ‘lib’가 지정되지 않았기 때문입니다)
Warning in install.packages :
  package ‘lavann’ is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
> library(lavaan)
> 
> specmod <- "
+     # path c
+     # identifying path c (prime) by putting c*
+     behavior ~ c*attitude
+ 
+     # path a
+     intention ~ a*attitude
+ 
+     # path b 
+     behavior ~ b*intention
+     
+     # indirect effect (a*b): Sobel test (Delta Method)
+     # 간접효과 a path x b path 를 구해서 얻음
+     # sobel test 라 부름
+     ab := a*b
+ "
> # Fit/estimate the model
> fitmod <- sem(specmod, data=df)
> # summarize the model
> summary(fitmod, fit.measures=TRUE, rsquare=T)
lavaan 0.6-9 ended normally after 11 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         5
                                                      
  Number of observations                           199
                                                      
Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Model Test Baseline Model:

  Test statistic                               103.700
  Degrees of freedom                                 3
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.000

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)               -481.884
  Loglikelihood unrestricted model (H1)       -481.884
                                                      
  Akaike (AIC)                                 973.767
  Bayesian (BIC)                               990.234
  Sample-size adjusted Bayesian (BIC)          974.393

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value RMSEA <= 0.05                             NA

Standardized Root Mean Square Residual:

  SRMR                                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  behavior ~                                          
    attitude   (c)    0.029    0.071    0.412    0.680
  intention ~                                         
    attitude   (a)    0.484    0.058    8.333    0.000
  behavior ~                                          
    intention  (b)    0.438    0.075    5.832    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .behavior          0.698    0.070    9.975    0.000
   .intention         0.623    0.062    9.975    0.000

R-Square:
                   Estimate
    behavior          0.199
    intention         0.259

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    ab                0.212    0.044    4.778    0.000

> 
> ##########################################
> # boot strapping instead of sobel test 
> ##########################################
> set.seed(101)
> fitmod2 <- sem(specmod, data=df, 
+                se="bootstrap", 
+                bootstrap=100)
> # bootstrap = 5000 is common
> 
> summary(fitmod2, fit.measures=TRUE, rsquare=TRUE)
lavaan 0.6-9 ended normally after 11 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         5
                                                      
  Number of observations                           199
                                                      
Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Model Test Baseline Model:

  Test statistic                               103.700
  Degrees of freedom                                 3
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.000

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)               -481.884
  Loglikelihood unrestricted model (H1)       -481.884
                                                      
  Akaike (AIC)                                 973.767
  Bayesian (BIC)                               990.234
  Sample-size adjusted Bayesian (BIC)          974.393

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value RMSEA <= 0.05                             NA

Standardized Root Mean Square Residual:

  SRMR                                           0.000

Parameter Estimates:

  Standard errors                            Bootstrap
  Number of requested bootstrap draws              100
  Number of successful bootstrap draws             100

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  behavior ~                                          
    attitude   (c)    0.029    0.066    0.446    0.655
  intention ~                                         
    attitude   (a)    0.484    0.051    9.453    0.000
  behavior ~                                          
    intention  (b)    0.438    0.075    5.843    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .behavior          0.698    0.061   11.464    0.000
   .intention         0.623    0.061   10.170    0.000

R-Square:
                   Estimate
    behavior          0.199
    intention         0.259

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    ab                0.212    0.045    4.690    0.000

> parameterEstimates(fitmod2, 
+                    ci=TRUE, level=.95, 
+                    boot.ci.type="perc")
        lhs op       rhs label   est    se      z pvalue ci.lower ci.upper
1  behavior  ~  attitude     c 0.029 0.066  0.446  0.655   -0.085    0.182
2 intention  ~  attitude     a 0.484 0.051  9.453  0.000    0.376    0.588
3  behavior  ~ intention     b 0.438 0.075  5.843  0.000    0.294    0.612
4  behavior ~~  behavior       0.698 0.061 11.464  0.000    0.561    0.822
5 intention ~~ intention       0.623 0.061 10.170  0.000    0.492    0.724
6  attitude ~~  attitude       0.928 0.000     NA     NA    0.928    0.928
7        ab :=       a*b    ab 0.212 0.045  4.690  0.000    0.131    0.319
>

질문. 위에서 rsquare 값은 무엇을 의미하나요?

우선 지정모델을 보면 (specmod) 아래처럼 세개의 리그레션을 지정한 것을 알 수 있습니다.

# 편의상 코멘트 지움
specmod <- "
     behavior ~ c*attitude
     behavior ~ b*intention
     intention ~ a*attitude
     ab := a*b
"

이 중에서 종속변인을 골라내자면
- behavior ~ attitude + intention 에서 나타나는 r square 값
- intention ~ attitude 에서 나타나는 intention에서의 r square 값이 있을 수 있습니다.
- 위의 둘은 poking the above 라는 아래 문서에서

lm.ba.01 <- lm(behavior~attitude+intention, data=df) # all
lm.ba.02 <- lm(behavior~intention, data=df)  # b path
lm.ba.03 <- lm(intention~attitude, data=df)  # a path
lm.ba.04 <- lm(attitude~intention, data=df)  # reverse a path 
lm.ba.05 <- lm(behavior~attitude, data=df)   # c prime path

위에서 lm.ba.01 과 lm.ba.03에 해당하는 regression이라고 하겠습니다.
- 아래는 이를 출력해보는 명령어입니다

> summary(lm.ba.01)$r.square
[1] 0.1989125
> summary(lm.ba.03)$r.square
[1] 0.2586768

바로 위의 값과 같은 값이라고 보면 되겠습니다.

질문. mediation effect는 왜 두 path를 곱하나요?

            -- Intention --    
        a --               -- b
        --                    --
Attitudes  --------c'---------   Behavior

위에서 a, b는 coefficients 값이라고 가정하고, 이 값들이 각각 a = 2, b = 1.5 라고 가정합니다. 이 때,

a는 (2) attitudes의 measurement 한 unit이 증가할 때 intention이 2 증가한다는 것을 의미합니다.
b는 (1.5)는 attitudes의 점수가 하나 증가할 때 마다 behavior가 2*1.5 증가함을 (3) 의미합니다. 즉, attitudes가 한 단위 증가할 때마다 beahvior는 3 증가합니다. 독립변인 attitudes의 intention을 매개로 한 영향력을 말할 때 이 3을 사용합니다. 따라서 ab (mediation effects) = a * b 로 생각할 수 있습니다.

Poking the above

# poking 둘러보기
# 모델 = 
#          a  Intention  b
# Attitude       c         Behavior
#
lm.ba.01 <- lm(behavior~attitude+intention, data=df) # all
lm.ba.02 <- lm(behavior~intention, data=df)  # b path
lm.ba.03 <- lm(intention~attitude, data=df)  # a path
lm.ba.04 <- lm(attitude~intention, data=df)  # reverse a path 
lm.ba.05 <- lm(behavior~attitude, data=df)   # c prime path

위에서

summary(lm.ba.05) 
summary(lm.ba.01)

> summary(lm.ba.05)

Call:
lm(formula = behavior ~ attitude, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.9679 -0.6291  0.0882  0.6625  1.8128 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.3355     0.2221   10.51  < 2e-16 ***
attitude      0.2413     0.0669    3.61  0.00039 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.909 on 197 degrees of freedom
Multiple R-squared:  0.062,	Adjusted R-squared:  0.0572 
F-statistic:   13 on 1 and 197 DF,  p-value: 0.000392

> summary(lm.ba.01)

Call:
lm(formula = behavior ~ attitude + intention, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.0192 -0.5728  0.0633  0.6373  1.7381 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.6962     0.2336    7.26  8.8e-12 ***
attitude      0.0294     0.0720    0.41     0.68    
intention     0.4377     0.0756    5.79  2.8e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.842 on 196 degrees of freedom
Multiple R-squared:  0.199,	Adjusted R-squared:  0.191 
F-statistic: 24.3 on 2 and 196 DF,  p-value: 3.64e-10

위 둘에서 attitude의 t test 결과를 비교해 보면

lm.ba.05 에서는 attitude 가 behavior 를 설명하는 것으로 나타남
그러나, lm.ba.01 에서 attitude 와 intention 을 동시에 넣었을 때에는
attitude 의 설명력은 의미가 없어지고 intention 만이 설명력을 보임

참고. 아래는 lm 으로 얻은 리그레션 결과가 가지는 데이터들로 residuals 은 리그레션 라인의 예측치와 실제관측치의 차이인 residual 값들을 저장한 데이터를 말한다.

> names(lm.ba.03)
 [1] "coefficients"  "residuals"     "effects"       "rank"         
 [5] "fitted.values" "assign"        "qr"            "df.residual"  
 [9] "xlevels"       "call"          "terms"         "model"        
>

또한 fitted.values 는 리그레션 라인의 예측치를 말하므로 이 예측치에서 Y 변인의 (intention) 평균값을 뺀 가치는 explained 혹은 regression 혹은 극복된 error에 해당하는 값이 된다. 이를 reg.int에 저장한 것이다.
또한 res.int는 residual 값을 저장한 것이다.
이 둘의 각각 제곱의 합은 SS regression과 SS residual이 된다.
그러므로 이 둘을 합한 값은 intention 변인의 각 개인 값에서 평균값을 제외한 값이 된다. 만약에 intention값을 그대로 (똑같이) 복원하고자 한다면
reg.reg + res.int + mean(df$intention) 을 이용하면 될 것이다.

reg.int <- lm.ba.03$fitted.values - mean(df$intention)
res.int <- summary(lm.ba.03)$residuals

또한 SS reg + SS res = SS y = SS total = SS intention 임을 알고 있다.

# just checking 
sum(reg.int^2) + sum(res.int^2)
var(df$intention)*(length(df$intention)-1)

아래는 behavior를 종속변인으로 하고 reg.int를 독립변인으로 regression을 한 결과이다. 이 결과는 intention 의 SS 중에서 attitude의 설명력 부분이 (regression 부분) behavior에 어떻게 설명이 되는가를 보기 위한 것이다. 그런데 이것은 매개하는 변인

# the intention part contributed by attitudes 
# is it explaing behavior too? 
lm.ba.021 <- lm(behavior~reg.int, data=df) 
summary(lm.ba.021)

반대로 아래는 res.int는 attitude의 설명력을 제외한 나머지이고 이것이 behavior에 어떻게 영향을 주는지 보는 것이다. 이는 attitude의 설명력이 “mediated”되지 않고 (attitude의 설명력을 제외한) 순수 intention이 behavior에 영향을 주는 것을 보는 부분이다.

# the pure intention part excluding 
# what attitude contributes
lm.ba.022 <- lm(behavior~res.int, data=df)
summary(lm.ba.022)

또한 당연한 이야기지만 int의 residual과 regression 파트를 더해서 behavior에 regression을 해보면 intention으로 regression을 한 것과 (b path) 같은 결과를 보여줄 것이다.

int.all <- res.int + reg.int 
lm.temp <- lm(behavior~int.all, data=df)
summary(lm.temp)
summary(lm.ba.02)

그러니,

lm.ba.021는 (reg.int로 behavior에 regression을 한 결과) attitude의 영향력이 mediated 된 intention 부분의 설명력을 보여주고
lm.ba.022는 (res.int로 behavior에 regression을 한 결과) attitude의 영향력을 제외한 intention 부분의 설명력을 보여주는 것이라고 볼 수 있다.

아래는 이를 다시 한번 출력해 본 것이다

summary(lm.ba.021)
summary(lm.ba.022)

혹은 아래와 같이 살펴볼 수도 있다.

# 아래는 attitude - intention - behavior 의
# 영향력의 경로를 도식화 준다. 

# intention - behavior part
summary(lm.ba.02)$r.squared
# K - attitudes 가 intention을 설명해 주는 부분 (regression error)
summary(lm.ba.021)$r.squared
# J - attitudes 가 설명하지 못하는 부분 (residual error) 
summary(lm.ba.022)$r.squared
# 위에서 intention은 K와 J로 이루어져 있다. 이를 확인하는 것
summary(lm.ba.021)$r.squared + summary(lm.ba.022)$r.squared

그렇다면 attitude가 intention을 통해서 mediated된 부분의 설명력을 제외한 나머지는 얼마가 될까라고 생각을 하면 이는 lm.ba.04 에서의 (reversed a path) residual 이 behavior에 얼마나 영향을 주는가를 보는 것과 같을 것이다. 이를 계산해 보면

# lm.ba.04 <- lm(attitude~intention, data=df)  # reverse a path 
res.temp <- lm.ba.04$residuals

# res.temp는 attitude 중에서 intention의 설명력을 제외한 (관계있는 부분을 제외한) 것을 의미
# 이것으로 behavior에 regression을 한다는 것은 attitude가 intention을 타고 넘어가는 (매개로
# 하는) 영향력을 제외한 부분만을 가지고 behavior에 regression을 한다는 것으로 이해할 수 있다
 
lm.temp <- lm(behavior~res.temp, data=df)
summary(lm.temp)
summary(lm.ba.01)

위에서
attitude의 regression coeffcient 값이 같음을 알 수 있다 (b = 0.02941).

> res.temp <- lm.ba.04$residuals
> lm.temp <- lm(behavior~res.temp, data=df)
> summary(lm.temp)

Call:
lm(formula = behavior ~ res.temp, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.1138 -0.6154  0.0725  0.6955  1.9098 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.1025     0.0665   46.66   <2e-16 ***
res.temp      0.0294     0.0802    0.37     0.71    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.938 on 197 degrees of freedom
Multiple R-squared:  0.000683,	Adjusted R-squared:  -0.00439 
F-statistic: 0.135 on 1 and 197 DF,  p-value: 0.714

> summary(lm.ba.01)

Call:
lm(formula = behavior ~ attitude + intention, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.0192 -0.5728  0.0633  0.6373  1.7381 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.6962     0.2336    7.26  8.8e-12 ***
attitude      0.0294     0.0720    0.41     0.68    
intention     0.4377     0.0756    5.79  2.8e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.842 on 196 degrees of freedom
Multiple R-squared:  0.199,	Adjusted R-squared:  0.191 
F-statistic: 24.3 on 2 and 196 DF,  p-value: 3.64e-10

그리고 위의 lm.ba.01의 R-squared 값이 0.199임은 Attitude와 Intention이 함께 설명하는 부분이 약 19.9%임을 말하고 있고, 이 19.9% 중에서 아주 극히 일부분만 (lm.temp의 R-squared 값인 0.000683) Attitude 고유의 설명력 부분이 된다는 것을 알수 있다. 좀더 살펴보자면

abc <- summary(lm.ba.01)$r.square
ab <- summary(lm.ba.02)$r.square
bc <- summary(lm.ba.05)$r.square
abc
ab
bc
abbc <- ab + bc 
abbc
# b는 아래처럼 구할 수도 있고
b <- abbc - abc
b
# 위에서 구한 summary(lm.ba.021)$r.squared 값이기도 하다 
summary(lm.ba.021)$r.squared
# a 또한 마찬가지
a <- abc - bc
a
# 혹은 
summary(lm.ba.022)$r.squared

# 아래 c 도 마찬가지이다
c <- abc - ab
c
summary(lm.temp)$r.squared

> abc <- summary(lm.ba.01)$r.square
> ab <- summary(lm.ba.02)$r.square
> bc <- summary(lm.ba.05)$r.square
> abc
[1] 0.1989125
> ab
[1] 0.1982297
> bc
[1] 0.06197255
> abbc <- ab + bc 
> abbc
[1] 0.2602023
> # b는 아래처럼 구할 수도 있고
> b <- abbc - abc
> b
[1] 0.0612898
> # 위에서 구한 summary(lm.ba.021)$r.squared 값이기도 하다 
> summary(lm.ba.021)$r.squared
[1] 0.06197255
> # a 또한 마찬가지
> a <- abc - bc
> a
[1] 0.1369399
> # 혹은 
> summary(lm.ba.022)$r.squared
[1] 0.1369399
> 
> # 아래 c 도 마찬가지이다
> c <- abc - ab
> c
[1] 0.0006827583
> summary(lm.temp)$r.squared
[1] 0.0006827583
>

Another modeling

성재학생이 다른 아이디어를 가지고 있다. 이를 분석에서 구현보고자 한다.

library(lavaan)
specmod <- "
    # regression test
    behavior ~ c*attitude
             + m*control
             + n*norms
    intention ~ a*attitude 
              + k*norms 
              + j*control
    behavior ~ b*intention 
    
    # indirect effect (a*b): Sobel test (Delta Method)
    ab := a*b
    kb := k*b
    jb := j*b
"
# Fit/estimate the model
fitmod <- sem(specmod, data=df)
# summarize the model
summary(fitmod, fit.measures=TRUE, rsquare=T)

Another moding output

> library(lavaan)
> 
> specmod <- "
+     # path c
+     # identifying path c (prime) by putting c*
+     behavior ~ c*attitude
+              + m*control
+              + n*norms
+ 
+     # path a
+     intention ~ a*attitude 
+               + k*norms 
+               + j*control
+ 
+     # path b 
+     behavior ~ b*intention 
+     
+     # indirect effect (a*b): Sobel test (Delta Method)
+     # 간접효과 a path x b path 를 구해서 얻음
+     # sobel test 라 부름
+     ab := a*b
+     kb := k*b
+     jb := j*b
+     
+ #    attitude ~~ norms + control
+ #    norms ~~ control
+ "
> # Fit/estimate the model
> fitmod <- sem(specmod, data=df)
> # summarize the model
> summary(fitmod, fit.measures=TRUE, rsquare=T)
lavaan 0.6-12 ended normally after 1 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         9

  Number of observations                           199

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Model Test Baseline Model:

  Test statistic                               137.622
  Degrees of freedom                                 7
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.000

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)               -464.922
  Loglikelihood unrestricted model (H1)       -464.922
                                                      
  Akaike (AIC)                                 947.845
  Bayesian (BIC)                               977.484
  Sample-size adjusted Bayesian (BIC)          948.972

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value RMSEA <= 0.05                             NA

Standardized Root Mean Square Residual:

  SRMR                                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  behavior ~                                          
    attitude   (c)    0.041    0.072    0.563    0.573
    control    (m)   -0.090    0.070   -1.285    0.199
    norms      (n)    0.042    0.069    0.605    0.545
  intention ~                                         
    attitude   (a)    0.352    0.058    6.068    0.000
    norms      (k)    0.153    0.059    2.577    0.010
    control    (j)    0.275    0.058    4.740    0.000
  behavior ~                                          
    intention  (b)    0.463    0.081    5.715    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .behavior          0.692    0.069    9.975    0.000
   .intention         0.530    0.053    9.975    0.000

R-Square:
                   Estimate
    behavior          0.206
    intention         0.369

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    ab                0.163    0.039    4.160    0.000
    kb                0.071    0.030    2.349    0.019
    jb                0.127    0.035    3.648    0.000

>

What about this output

library(lavaan)

specmod <- "
    # path c
    # identifying path c (prime) by putting c*

    # path a
    intention ~ a*attitude 
              + k*norms 
              + j*control

    # path b 
    behavior ~ b*intention 
    
    # indirect effect (a*b): Sobel test (Delta Method)
    # 간접효과 a path x b path 를 구해서 얻음
    # sobel test 라 부름
    ab := a*b
    kb := k*b
    jb := j*b
    
#    attitude ~~ norms + control
#    norms ~~ control
"
# Fit/estimate the model
fitmod <- cfa(specmod, data=df)
# summarize the model
summary(fitmod, fit.measures=TRUE, rsquare=T)

아래 아웃풋의 . . . path analysis 참조
chi-square test p-value는 어떤가?
CFI, TLI는?
그 외의 지수는?
어떤 모델이 지금의 현상을 가장 잘 설명하는다고 판단하는가?

Model fit
- Chi-square Test: p-value less than p-critical value (.05 for example) indicates that model does not fit well enough. p-value more than critical value means the model fits the data relatively well. The test is sensitive to the sample size and normality of the data.
- CFI (Comparative Fit Index): greater than .90 indicates good fit to the data. It is less sensitive to the sample size and normality of the data than chi-square test.
- TLI (Tucker-Lewis Index): greater than .95 (sometimes .90) indicates good fit. It is less sensitive to the sample size.
- RMSEA (Root Mean Square Error of Approximation): equal to or less than .08 (sometimes .10 is used) indicates good fit to the data.
- SRMR (Standard Root Mean square Residual): less than or equal to .08 indicates good fit to the data.

$\chi^2$	$\text{CFI}$	$\text{TLI}$	$\text{RMSEA}$	$\text{SRMR}$
$p \ge .05$	$p \ge .90$	$p \ge .95$	$p \le .08$	$p \le .08$

Then what is SEM (Structural Equation Modeling)

Relationships within and among variables and constructs

Output

library(lavaan)
> 
> specmod <- "
+     # path c
+     # identifying path c (prime) by putting c*
+ 
+     # path a
+     intention ~ a*attitude 
+               + k*norms 
+               + j*control
+ 
+     # path b 
+     behavior ~ b*intention 
+     
+     # indirect effect (a*b): Sobel test (Delta Method)
+     # 간접효과 a path x b path 를 구해서 얻음
+     # sobel test 라 부름
+     ab := a*b
+     kb := k*b
+     jb := j*b
+     
+ #    attitude ~~ norms + control
+ #    norms ~~ control
+ "
> # Fit/estimate the model
> fitmod <- cfa(specmod, data=df)
> # summarize the model
> summary(fitmod, fit.measures=TRUE, rsquare=T)
lavaan 0.6-12 ended normally after 1 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         6

  Number of observations                           199

Model Test User Model:
                                                      
  Test statistic                                 2.023
  Degrees of freedom                                 3
  P-value (Chi-square)                           0.568

Model Test Baseline Model:

  Test statistic                               137.622
  Degrees of freedom                                 7
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.017

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)               -465.934
  Loglikelihood unrestricted model (H1)       -464.922
                                                      
  Akaike (AIC)                                 943.868
  Bayesian (BIC)                               963.628
  Sample-size adjusted Bayesian (BIC)          944.619

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.103
  P-value RMSEA <= 0.05                          0.735

Standardized Root Mean Square Residual:

  SRMR                                           0.019

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  intention ~                                         
    attitude   (a)    0.352    0.058    6.068    0.000
    norms      (k)    0.153    0.059    2.577    0.010
    control    (j)    0.275    0.058    4.740    0.000
  behavior ~                                          
    intention  (b)    0.453    0.065    7.014    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .intention         0.530    0.053    9.975    0.000
   .behavior          0.699    0.070    9.975    0.000

R-Square:
                   Estimate
    intention         0.369
    behavior          0.198

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    ab                0.160    0.035    4.589    0.000
    kb                0.069    0.029    2.419    0.016
    jb                0.125    0.032    3.927    0.000

e.g. with a categorical variable

# regression with job placement data 

df <- read.csv("http://commres.net/wiki/_media/r/plannedbehavior.csv")
head(df)

df<- within(df, {   
  norms.cat <- NA # need to initialize variable
  norms.cat[norms < tmp[2]] <- "Low"
  norms.cat[norms >= tmp[2] & norms < tmp[3]] <- "Middle"
  norms.cat[norms >= tmp[3]] <- "High"
} )
head(df)

med.m.01 <- '
    # mediator
    intention ~ a*attitude
    behavior ~ b*intention
    
    # direct effect c
    behavior ~ c*attitude
    
    # indirect effect 
    ab := a*b
    
    # total effect 
    tot := c + ab
    '
fit <- sem(med.m.01, data = df,
           meanstructure = T, 
           se = "boot", bootstrap = 500)
summary(fit, fit.measures = T,
        standardized = T, 
        ci = T)

###
mod.m.02 <- '
    # mediator
    intention ~ c(ag1,ag2,ag3)*attitude
    behavior ~ c(bg1,bg2,bg3)*intention
    
    # direct effect
    behavior ~ c(cg1,cg2,cg3)*attitude
    
    # indirect effect
    abg1 := ag1*bg1 # for group 1
    abg2 := ag2*bg2 
    abg3 := ag3*bg3 

    # tot effect
    totalg1 := cg1 + (ag1*bg1)
    totalg2 := cg2 + (ag2*bg2)
    totalg3 := cg3 + (ag3*bg3)
'

fit.by.norms.cat <- sem(mod.m.02, data = df,
                        group = "norms.cat",
                        se = "boot", bootstrap = 500,
                        meanstructure = T)
summary(fit.by.norms.cat, 
        fit.measures = T,
        standardized = T,
        ci = T)

all.constraints <- '
    ag1 == ag2 == ag3
    bg1 == bg2 == bg3
    cg1 == cg2 == cg3
'    

lavTestWald(fit.by.norms.cat,
            constraints = all.constraints)

lavTestWald(fit.by.norms.cat, 
            constraints = "ag1==ag2==ag3")
lavTestWald(fit.by.norms.cat, 
            constraints = "bg1==bg2==bg3")
lavTestWald(fit.by.norms.cat, 
            constraints = "cg1==cg2==cg3")

# or 
full.mod.mediation <- '
    # mediator
    intention ~ a*attitude
    behavior ~ b*intention + w*norms.cat
    
    # define moderator 
    Z := w*b
    
    # direct effect
    behavior ~ c*attitude
    
    # indirect effect
    ab := a*b

    # tot effect
    total := c + (a*b)
'

full.mod <- sem (full.mod.mediation, data = df,
                 se = "boot", bootstrap = 500, 
                 meanstructure = T)
summary(full.mod, fit.measures = T,
        stand = T, ci = T)

e.gs

# install.packages("ISLR")
library(ISLR)
attach(mtcars)

# x <- disp (배기량) 
# y <- mpg (아일리지)
# m <- wt (무게)

specmod <- "
    # path c
    # identifying path c (prime) by putting c*
    mpg ~ c*disp

    # path a
    wt ~ a*disp

    # path b 
    mpg ~ b*wt
    
    # indirect effect (a*b): Sobel test (Delta Method)
    # 간접효과 a path x b path 를 구해서 얻음
    # sobel test 라 부름
    ab := a*b
"
# Fit/estimate the model
fitmod <- sem(specmod, data=mtcars)
summary(fitmod)

set.seed(101)
fitmod2 <- sem(specmod, data=mtcars, 
               se="bootstrap", 
               bootstrap=200)
# bootstrap = 5000 is common

summary(fitmod2, fit.measures=TRUE, rsquare=TRUE)
parameterEstimates(fitmod2, 
                   ci=TRUE, level=.95, 
                   boot.ci.type="perc")

install.packages("ISLR")
library(ISLR)

attach(Carseats)

mod.total <- lm(Sales~Price, data=Carseats)
summary(mod.total)

mod.m <- lm(CompPrice~Price, data=Carseats)
summary(mod.m)

mod.y <- lm(Sales~Price+CompPrice, data=Carseats)
summary(mod.y)


# install.packages("multilevel")
library(multilevel)
sob.m <- sobel(pred = Price, med = CompPrice, out = Sales)
sob.m
pnorm(abs(sob.m$z.value), lower.tail = FALSE) * 2

install.packages("bda")
library(bda)
mediation.test(mv=CompPrice, iv=Price, dv=Sales)

install.packages("nloptr")
library(nloptr)
install.packages("mediation")
library(mediation)

set.seed(101)
mod.mediation <- mediate(model.m=mod.m, 
                         model.y = mod.y, 
                         treat="Price",
                         mediator="CompPrice",
                         boot=TRUE, sims=500)

summary(mod.mediation)

# Total Effect = 
# ADE Average Direct Effect = 직접효과
# ACME Average Causal Mediation Effect = 간접효과

plot(mod.mediation)

e.g. 2

https://advstats.psychstat.org/book/mediation/index.php

####################################
nlsy <- read.csv("http://commres.net/wiki/_media/r/nlsy.csv")
attach(nlsy)
# install.packages("bmem")
library(bmem)
library(sem)
# install.packages("cfa")
library(cfa)

##########################
nlsy.model<-specifyEquations(exog.variances=T) 
math =b*HE + cp*ME
HE = a*ME
<ENTER>

effects<-c('a*b', 'cp+a*b')
nlsy.res<-bmem.sobel(nlsy, nlsy.model,effects)
##########################

m.me.he <- lm(ME~HE)
m.math.me <- lm(math~ME)
m.math.he <- lm(math~HE)
m.math.mehe <- lm(math~ME+HE)
m.he.me <- lm(HE~ME)
summary(m.he.me)
res.m.he.me <- resid(m.he.me)
m.temp <- lm(math~res.m.he.me)
summary(m.temp)
res.m.me.he <- resid(m.me.he)
m.temp2 <- lm(math~res.m.me.he)
summary(m.temp2)

library(lavaan)
specmod <- "
# path c' (direct effect)
math ~ c*ME 

# path a
HE ~ a*ME

# path b
math ~ b*HE

# indirect effect (a*b)
# sobel test (Delta method)
ab := a*b
"

# fit/estimate model
fitmod <- sem(specmod, data=df)

# summarize/result the output
summary(fitmod, fit.measures=TRUE, rsquare=TRUE)

# for a 
summary(m.he.me)
# for b
summary(m.temp)
# for cprime 
summary(m.temp2)

a <- summary(m.he.me)$coefficient[2] # a
b <- summary(m.temp)$coefficient[2]  # b
c <- summary(m.temp2)$coefficient[2] # c
a
b
c
a*b

c2 <- summary(fitmod)$pe$est[1]
a2 <- summary(fitmod)$pe$est[2]
b2 <- summary(fitmod)$pe$est[3]
ab2 <- summary(fitmod)$pe$est[7]
a2
b2
c2
ab2

> ####################################
> nlsy <- read.csv("http://commres.net/wiki/_media/r/nlsy.csv")
> attach(nlsy)
The following objects are masked from nlsy (pos = 3):

    HE, math, ME

The following objects are masked from nlsy (pos = 5):

    HE, math, ME

The following objects are masked from df:

    HE, math, ME

> # install.packages("bmem")
> library(bmem)
> library(sem)
> # install.packages("cfa")
> library(cfa)
>

> nlsy.model<-specifyEquations(exog.variances=T) 
1: math =b*HE + cp*ME
2: HE = a*ME
3: 
Read 2 items
NOTE: adding 3 variances to the model
> effects<-c('a*b', 'cp+a*b')
> nlsy.res<-bmem.sobel(nlsy, nlsy.model,effects)
           Estimate       S.E.   z-score      p.value
b        0.46450283 0.14304860  3.247168 1.165596e-03
cp       0.46281480 0.11977862  3.863918 1.115825e-04
a        0.13925694 0.04292438  3.244239 1.177650e-03
V[HE]    2.73092635 0.20078170 13.601471 0.000000e+00
V[math] 20.67659134 1.52017323 13.601471 0.000000e+00
V[ME]    4.00590078 0.29451968 13.601471 0.000000e+00
a*b      0.06468524 0.02818457  2.295059 2.172977e-02
cp+a*b   0.52750005 0.11978162  4.403848 1.063474e-05
>

> m.me.he <- lm(ME~HE)
> m.math.me <- lm(math~ME)
> m.math.he <- lm(math~HE)
> m.math.mehe <- lm(math~ME+HE)
> m.he.me <- lm(HE~ME)
> summary(m.he.me)

Call:
lm(formula = HE ~ ME)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.5020 -0.7805  0.2195  1.2195  3.3587 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.10944    0.49127   8.365 1.25e-15 ***
ME           0.13926    0.04298   3.240   0.0013 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.655 on 369 degrees of freedom
Multiple R-squared:  0.02766,	Adjusted R-squared:  0.02502 
F-statistic:  10.5 on 1 and 369 DF,  p-value: 0.001305

> res.m.he.me <- resid(m.he.me)
> m.temp <- lm(math~res.m.he.me)
> summary(m.temp)

Call:
lm(formula = math ~ res.m.he.me)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.4263  -3.1496  -0.3499   2.2826  29.7795 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  12.1186     0.2427  49.936  < 2e-16 ***
res.m.he.me   0.4645     0.1471   3.159  0.00172 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.674 on 369 degrees of freedom
Multiple R-squared:  0.02633,	Adjusted R-squared:  0.02369 
F-statistic: 9.978 on 1 and 369 DF,  p-value: 0.001715

> res.m.me.he <- resid(m.me.he)
> m.temp2 <- lm(math~res.m.me.he)
> summary(m.temp2)

Call:
lm(formula = math ~ res.m.me.he)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.1938  -2.7965  -0.3425   2.4081  29.5656 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  12.1186     0.2413   50.22  < 2e-16 ***
res.m.me.he   0.4628     0.1224    3.78 0.000183 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.648 on 369 degrees of freedom
Multiple R-squared:  0.03728,	Adjusted R-squared:  0.03467 
F-statistic: 14.29 on 1 and 369 DF,  p-value: 0.0001828

> 
> library(lavaan)
> specmod <- "
+ # path c' (direct effect)
+ math ~ c*ME 
+ 
+ # path a
+ HE ~ a*ME
+ 
+ # path b
+ math ~ b*HE
+ 
+ # indirect effect (a*b)
+ # sobel test (Delta method)
+ ab := a*b
+ "
> 
> # fit/estimate model
> fitmod <- sem(specmod, data=df)
> 
> # summarize/result the output
> summary(fitmod, fit.measures=TRUE, rsquare=TRUE)
lavaan 0.6-12 ended normally after 1 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         5

  Number of observations                           371

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Model Test Baseline Model:

  Test statistic                                39.785
  Degrees of freedom                                 3
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.000

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -1800.092
  Loglikelihood unrestricted model (H1)      -1800.092
                                                      
  Akaike (AIC)                                3610.184
  Bayesian (BIC)                              3629.765
  Sample-size adjusted Bayesian (BIC)         3613.901

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value RMSEA <= 0.05                             NA

Standardized Root Mean Square Residual:

  SRMR                                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  math ~                                              
    ME         (c)    0.463    0.120    3.869    0.000
  HE ~                                                
    ME         (a)    0.139    0.043    3.249    0.001
  math ~                                              
    HE         (b)    0.465    0.143    3.252    0.001

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .math             20.621    1.514   13.620    0.000
   .HE                2.724    0.200   13.620    0.000

R-Square:
                   Estimate
    math              0.076
    HE                0.028

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    ab                0.065    0.028    2.298    0.022

> 
> # for a 
> summary(m.he.me)

Call:
lm(formula = HE ~ ME)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.5020 -0.7805  0.2195  1.2195  3.3587 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.10944    0.49127   8.365 1.25e-15 ***
ME           0.13926    0.04298   3.240   0.0013 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.655 on 369 degrees of freedom
Multiple R-squared:  0.02766,	Adjusted R-squared:  0.02502 
F-statistic:  10.5 on 1 and 369 DF,  p-value: 0.001305

> # for b
> summary(m.temp)

Call:
lm(formula = math ~ res.m.he.me)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.4263  -3.1496  -0.3499   2.2826  29.7795 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  12.1186     0.2427  49.936  < 2e-16 ***
res.m.he.me   0.4645     0.1471   3.159  0.00172 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.674 on 369 degrees of freedom
Multiple R-squared:  0.02633,	Adjusted R-squared:  0.02369 
F-statistic: 9.978 on 1 and 369 DF,  p-value: 0.001715

> # for cprime 
> summary(m.temp2)

Call:
lm(formula = math ~ res.m.me.he)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.1938  -2.7965  -0.3425   2.4081  29.5656 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  12.1186     0.2413   50.22  < 2e-16 ***
res.m.me.he   0.4628     0.1224    3.78 0.000183 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.648 on 369 degrees of freedom
Multiple R-squared:  0.03728,	Adjusted R-squared:  0.03467 
F-statistic: 14.29 on 1 and 369 DF,  p-value: 0.0001828

> 
> a <- summary(m.he.me)$coefficient[2] # a
> b <- summary(m.temp)$coefficient[2]  # b
> c <- summary(m.temp2)$coefficient[2] # c
> a
[1] 0.1392569
> b
[1] 0.4645028
> c
[1] 0.4628148
> a*b
[1] 0.06468524
> 
> c2 <- summary(fitmod)$pe$est[1]
> a2 <- summary(fitmod)$pe$est[2]
> b2 <- summary(fitmod)$pe$est[3]
> ab2 <- summary(fitmod)$pe$est[7]
> a2
[1] 0.1392569
> b2
[1] 0.4645028
> c2
[1] 0.4628148
> ab2
[1] 0.06468524
>

temp

tests <- read.csv("http://commres.net/wiki/_media/r/tests_cor.csv")
colnames(tests) <- c("ser", "sat", "clep", "gpa")
tests <- subset(tests, select=c("sat", "clep", "gpa"))
attach(tests)

lm.gpa.clepsat <- lm(gpa ~ clep + sat, data = tests) 
summary(lm.gpa.clepsat)

lm.gpa.clep <- lm(gpa ~ clep)
lm.gpa.sat <- lm(gpa ~ sat)
summary(lm.gpa.clep)
summary(lm.gpa.sat)

>  lm.gpa.clepsat <- lm(gpa ~ clep + sat, data = tests) 
> summary(lm.gpa.clepsat)

Call:
lm(formula = gpa ~ clep + sat, data = tests)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.197888 -0.128974 -0.000528  0.131170  0.226404 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)  1.1607560  0.4081117   2.844   0.0249 *
clep         0.0729294  0.0253799   2.874   0.0239 *
sat         -0.0007015  0.0012564  -0.558   0.5940  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1713 on 7 degrees of freedom
Multiple R-squared:  0.7778,	Adjusted R-squared:  0.7143 
F-statistic: 12.25 on 2 and 7 DF,  p-value: 0.005175
>

0.7778 이 두 변인 clep 과 sat 가 gpa를 설명하는 부분
summary(lm.gpa.clepsat)$r.squared = 0.778

그리고, 위에서 sat의 설명력은 significant하지 않음
그럼 sat만으로 gpa를 보면?

> lm.gpa.clep <- lm(gpa ~ clep)
> lm.gpa.sat <- lm(gpa ~ sat)
> summary(lm.gpa.clep)

Call:
lm(formula = gpa ~ clep)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.190496 -0.141167 -0.002376  0.110847  0.225207 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.17438    0.38946   3.015 0.016676 *  
clep         0.06054    0.01177   5.144 0.000881 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1637 on 8 degrees of freedom
Multiple R-squared:  0.7679,	Adjusted R-squared:  0.7388 
F-statistic: 26.46 on 1 and 8 DF,  p-value: 0.0008808

> summary(lm.gpa.sat)

Call:
lm(formula = gpa ~ sat)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.23544 -0.12184  0.00316  0.02943  0.56456 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) 1.7848101  0.4771715   3.740   0.0057 **
sat         0.0024557  0.0008416   2.918   0.0193 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2365 on 8 degrees of freedom
Multiple R-squared:  0.5156,	Adjusted R-squared:  0.455 
F-statistic: 8.515 on 1 and 8 DF,  p-value: 0.01935

>

위에서처럼 significant함.
summary(lm.gpa.clep)$r.squared = 0.7679
summary(lm.gpa.sat)$r.squared = 0.5156

그렇다면 sat의 영향력은 clep을 매개로 해서 나타나는가를 보기 위해서

> lm.clep.sat <- lm(clep ~ sat) 
> summary(lm.clep.sat)

Call:
lm(formula = clep ~ sat)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5316 -1.2437 -0.2848  0.0949  5.6329 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 8.556962   4.813367   1.778  0.11334    
sat         0.043291   0.008489   5.100  0.00093 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.386 on 8 degrees of freedom
Multiple R-squared:  0.7648,	Adjusted R-squared:  0.7353 
F-statistic: 26.01 on 1 and 8 DF,  p-value: 0.0009303

>

res.lm.clep.sat <- resid(lm.clep.sat)
reg.lm.clep.sat <- predict(lm.clep.sat)-mean(clep)
lm.gpa.sat.mediated.via.clep <- lm(gpa~reg.lm.clep.sat)
lm.gpa.clep.alone <- lm(gpa~res.lm.clep.sat)

summary(lm.gpa.sat.mediated.via.clep)
summary(lm.gpa.clep.alone)

> res.lm.clep.sat <- resid(lm.clep.sat)
> reg.lm.clep.sat <- predict(lm.clep.sat)-mean(clep)
> lm.gpa.sat.mediated.via.clep <- lm(gpa~reg.lm.clep.sat)
> lm.gpa.clep.alone <- lm(gpa~res.lm.clep.sat)
> 
> summary(lm.gpa.sat.mediated.via.clep)

Call:
lm(formula = gpa ~ reg.lm.clep.sat)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.23544 -0.12184  0.00316  0.02943  0.56456 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      3.16000    0.07480  42.246 1.09e-10 ***
reg.lm.clep.sat  0.05673    0.01944   2.918   0.0193 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2365 on 8 degrees of freedom
Multiple R-squared:  0.5156,	Adjusted R-squared:  0.455 
F-statistic: 8.515 on 1 and 8 DF,  p-value: 0.01935

> summary(lm.gpa.clep.alone)

Call:
lm(formula = gpa ~ res.lm.clep.sat)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.34523 -0.23300 -0.04416  0.27577  0.36370 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      3.16000    0.09231  34.231  5.8e-10 ***
res.lm.clep.sat  0.07293    0.04326   1.686     0.13    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2919 on 8 degrees of freedom
Multiple R-squared:  0.2622,	Adjusted R-squared:  0.1699 
F-statistic: 2.842 on 1 and 8 DF,  p-value: 0.1303

>

sat의 영향력 mediated via clep = Multiple R-squared: 0.5156
sat의 영향력을 제외한 celp만의 설명력 = Multiple R-squared: 0.2622

temp2

> #############################
> #############################
> # https://data.library.virginia.edu/introduction-to-mediation-analysis/
> # Download data online. This is a simulated dataset for this post.
> myData <- read.csv('http://static.lib.virginia.edu/statlab/materials/data/mediationData.csv')
> 
> model.0 <- lm(Y ~ X, myData)
> summary(model.0)

Call:
lm(formula = Y ~ X, data = myData)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.0262 -1.2340 -0.3282  1.5583  5.1622 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.8572     0.6932   4.122 7.88e-05 ***
X             0.3961     0.1112   3.564 0.000567 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.929 on 98 degrees of freedom
Multiple R-squared:  0.1147,	Adjusted R-squared:  0.1057 
F-statistic:  12.7 on 1 and 98 DF,  p-value: 0.0005671

> # c line의 coef 값 = 0.396 at p-level = 0.0006
> 
> model.M <- lm(M ~ X, myData)
> summary(model.M)

Call:
lm(formula = M ~ X, data = myData)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.3046 -0.8656  0.1344  1.1344  4.6954 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.49952    0.58920   2.545   0.0125 *  
X            0.56102    0.09448   5.938 4.39e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.639 on 98 degrees of freedom
Multiple R-squared:  0.2646,	Adjusted R-squared:  0.2571 
F-statistic: 35.26 on 1 and 98 DF,  p-value: 4.391e-08

> # a line = 0.561 p = 0.0
> 
> model.Y <- lm(Y ~ X + M, myData)
> summary(model.Y)

Call:
lm(formula = Y ~ X + M, data = myData)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.7631 -1.2393  0.0308  1.0832  4.0055 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.9043     0.6055   3.145   0.0022 ** 
X             0.0396     0.1096   0.361   0.7187    
M             0.6355     0.1005   6.321 7.92e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.631 on 97 degrees of freedom
Multiple R-squared:  0.373,	Adjusted R-squared:  0.3601 
F-statistic: 28.85 on 2 and 97 DF,  p-value: 1.471e-10

> #################################################
> ## b line + c' line (c에서 M이 추가되었으므로)
> ## c' = 0.0396 ---
> ## b  = 0.6355 ***
> #################################################
> 
> library(mediation)
> results <- mediate(model.M, model.Y, 
+                    treat='X', 
+                    mediator='M',
+                    boot=TRUE, sims=500)
Running nonparametric bootstrap

> summary(results)

Causal Mediation Analysis 

Nonparametric Bootstrap Confidence Intervals with the Percentile Method

               Estimate 95% CI Lower 95% CI Upper p-value    
ACME             0.3565       0.2174         0.54  <2e-16 ***
ADE              0.0396      -0.1712         0.31    0.74    
Total Effect     0.3961       0.1890         0.64  <2e-16 ***
Prop. Mediated   0.9000       0.4707         1.69  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Sample Size Used: 100 


Simulations: 500 

> 
> # OR with lavaan
> library(lavaan)
> specmod <- "
+ # path c' (direct effect)
+ Y ~ c*X 
+ 
+ # path a
+ M ~ a*X
+ 
+ # path b
+ Y ~ b*M
+ 
+ # indirect effect (a*b)
+ # sobel test (Delta method)
+ ab := a*b
+ "
> 
> # fit/estimate model
> fitmod <- sem(specmod, data=myData)
> 
> # summarize/result the output
> summary(fitmod, fit.measures=TRUE, rsquare=TRUE)
lavaan 0.6-12 ended normally after 1 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         5

  Number of observations                           100

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Model Test Baseline Model:

  Test statistic                                77.413
  Degrees of freedom                                 3
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.000

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)               -379.612
  Loglikelihood unrestricted model (H1)       -379.612
                                                      
  Akaike (AIC)                                 769.225
  Bayesian (BIC)                               782.250
  Sample-size adjusted Bayesian (BIC)          766.459

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value RMSEA <= 0.05                             NA

Standardized Root Mean Square Residual:

  SRMR                                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  Y ~                                                 
    X          (c)    0.040    0.108    0.367    0.714
  M ~                                                 
    X          (a)    0.561    0.094    5.998    0.000
  Y ~                                                 
    M          (b)    0.635    0.099    6.418    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Y                 2.581    0.365    7.071    0.000
   .M                 2.633    0.372    7.071    0.000

R-Square:
                   Estimate
    Y                 0.373
    M                 0.265

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    ab                0.357    0.081    4.382    0.000

> 
> # Resampling method
> set.seed(2019)
> 
> fitmod <- sem(specmod, data=myData, se="bootstrap", bootstrap=1000)
> summary(fitmod, fit.measures=TRUE, rsquare=TRUE)
lavaan 0.6-12 ended normally after 1 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         5

  Number of observations                           100

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Model Test Baseline Model:

  Test statistic                                77.413
  Degrees of freedom                                 3
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.000

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)               -379.612
  Loglikelihood unrestricted model (H1)       -379.612
                                                      
  Akaike (AIC)                                 769.225
  Bayesian (BIC)                               782.250
  Sample-size adjusted Bayesian (BIC)          766.459

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value RMSEA <= 0.05                             NA

Standardized Root Mean Square Residual:

  SRMR                                           0.000

Parameter Estimates:

  Standard errors                            Bootstrap
  Number of requested bootstrap draws             1000
  Number of successful bootstrap draws            1000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  Y ~                                                 
    X          (c)    0.040    0.124    0.319    0.750
  M ~                                                 
    X          (a)    0.561    0.095    5.888    0.000
  Y ~                                                 
    M          (b)    0.635    0.102    6.224    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Y                 2.581    0.334    7.731    0.000
   .M                 2.633    0.362    7.271    0.000

R-Square:
                   Estimate
    Y                 0.373
    M                 0.265

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    ab                0.357    0.080    4.442    0.000

> parameterEstimates(fitmod, ci=TRUE, level=.95, boot.ci.type="perc")
  lhs op rhs label   est    se     z pvalue ci.lower ci.upper
1   Y  ~   X     c 0.040 0.124 0.319   0.75   -0.200    0.288
2   M  ~   X     a 0.561 0.095 5.888   0.00    0.391    0.765
3   Y  ~   M     b 0.635 0.102 6.224   0.00    0.439    0.835
4   Y ~~   Y       2.581 0.334 7.731   0.00    1.868    3.193
5   M ~~   M       2.633 0.362 7.271   0.00    1.890    3.311
6   X ~~   X       3.010 0.000    NA     NA    3.010    3.010
7  ab := a*b    ab 0.357 0.080 4.442   0.00    0.225    0.538
> 
> summary(lm(Y~X+M,data=myData))$r.square
[1] 0.3729992
> summary(lm(M~X,data=myData))$r.square
[1] 0.2645879
> var(myData$Y)
[1] 4.158687
> var(myData$M)
[1] 3.616566
> var(myData$X)
[1] 3.040303
>

COMMunication
RESearch.NET

Table of Contents