This is an old revision of the document!
Table of Contents
Mediation Analysis
Planned behavior 이론에 따라서 연구자는 아래와 같은 모델을 만들고 데이터를 얻은 후 테스트하려고 한다. 특히 이 단계에서 연구자는 Attitudes가 Behavior 미치는 영향력을 Attitudes 고유의 것과 Intention을 거쳐가는 것으로 구분하여 확인해보려고 한다. 이와 같은 통계검증 방식을 mediation analysis라고 하는데 . . . .
보통 lavaan package를 활용하여 (path analysis를 위해서 개발) mediator 변인의 효과와 두 독립변인의 공통효과를 알아낸다.
-- Intention --
a -- -- b
-- --
Attitudes --------c'--------- Behavior
위의 그림에서 attitudes 가 behavior 미치는 영향력을 볼 때
- c' = 0 인 경우가 있다. 이 경우에 attitudes는 behavior에 대한 직접적인 영향력은 없고 오로지 a 와 b의 path만을 거쳐서 영향력을 (설명력을) 갖게 된다. 이 경우를 완전매개라고 (complete mediation) 한다
- 반면에 c' ~= 0 인 경우가 있다. 이 때는 c에도 부분적으로 설명력이 걸리고, a b를 통해서도 부분적으로 설명력이 걸리는 경우이므로 부분매개라고 (partial mediation) 부른다.
또한 mediation analysis에서 독립변인들의 효과를 (설명력을) 직접효과와 간접효과로 나눌 수 있는데 직접효과는 a, b, 그리고 c'를 직접효과라고 (direct effects) 하고 a와 b를 거쳐서 가는 효과를 간접효과라고 (indirect effects) 한다. Indirect effects 의 크기를 어떻게 측정하는가에는 여러가지 방법이 있을 수 있지만, 가장 많이 쓰이는 방법으로는
- a path와 b path의 coefficient값을 곱한 값을 취하는 방법이다
- 다른 방법으로는 b - a 값을 취하는 방법이 있지만 흔하지는 않다
위에서 a b 를 곱해서 간접효과를 측정할 때에 그 값이 (효과가) significant한지 알아보기 위한 테스트에는 두 가지 방법이 있을 수 있는데
- Delta method와 Resampling method 이다
- 전자는 indirect coefficient의 (곱한 값의) sampling distribution이 normal distribution을 이룬다는 전제하에 구하는 방법인데, 이런 경우가 드문 편이라 현실적이지 않은 면이 있다. 또한 샘플사이즈가 작을 때에는 부정확한 면이 있다
- 후자인 resampling method는 샘플링을 반복해서 indirect coefficient 값을 얻은 후에 이를 가지고 효과를 측정하는 경우이다.
- Percentile bootstrapping
- bias-corrected bootstrapping
- permutation method
- monte carlo method 등을 사용한다.
######################################################
## data file: PlannedBehavior.csv
######################################################
######################################################
install.packages("readr")
library(readr)
df <- read.csv("http://commres.net/wiki/_media/r/plannedbehavior.csv")
head(df)
str(df)
# attitude
# norms
# control
# intention
# behavior
######################################################
######################################################
######################################################
# Mediation Analysis using lavaan
# in R (path analysis framework, SEM)
######################################################
######################################################
######################################################
# specifying path analysis model
# by using lavann package
install.packages("lavann")
library(lavaan)
specmod <- "
# path c
# identifying path c (prime) by putting c*
behavior ~ c*attitude
# path a
intention ~ a*attitude
# path b
behavior ~ b*intention
# indirect effect (a*b): Sobel test (Delta Method)
# 간접효과 a path x b path 를 구해서 얻음
# sobel test 라 부름
ab := a*b
"
# Fit/estimate the model
fitmod <- sem(specmod, data=df)
# summarize the model
summary(fitmod, fit.measures=TRUE, rsquare=T)
##########################################
# boot strapping instead of sobel test
##########################################
set.seed(101)
fitmod2 <- sem(specmod, data=df,
se="bootstrap",
bootstrap=100)
# bootstrap = 5000 is common
summary(fitmod2, fit.measures=TRUE, rsquare=TRUE)
parameterEstimates(fitmod2,
ci=TRUE, level=.95,
boot.ci.type="perc")
output
> ######################################################
> ## data file: PlannedBehavior.csv
> ######################################################
> ######################################################
> install.packages("readr")
Error in install.packages : Updating loaded packages
> library(readr)
> df <- read.csv("http://commres.net/wiki/_media/r/plannedbehavior.csv")
> head(df)
attitude norms control intention behavior
1 2.31 2.31 2.03 2.50 2.62
2 4.66 4.01 3.63 3.99 3.64
3 3.85 3.56 4.20 4.35 3.83
4 4.24 2.25 2.84 1.51 2.25
5 2.91 3.31 2.40 1.45 2.00
6 2.99 2.51 2.95 2.59 2.20
> str(df)
'data.frame': 199 obs. of 5 variables:
$ attitude : num 2.31 4.66 3.85 4.24 2.91 2.99 3.96 3.01 4.77 3.67 ...
$ norms : num 2.31 4.01 3.56 2.25 3.31 2.51 4.65 2.98 3.09 3.63 ...
$ control : num 2.03 3.63 4.2 2.84 2.4 2.95 3.77 1.9 3.83 5 ...
$ intention: num 2.5 3.99 4.35 1.51 1.45 2.59 4.08 2.58 4.87 3.09 ...
$ behavior : num 2.62 3.64 3.83 2.25 2 2.2 4.41 4.15 4.35 3.95 ...
> # attitude
> # norms
> # control
> # intention
> # behavior
> ######################################################
> ######################################################
> ######################################################
> # Mediation Analysis using lavaan
> # in R (path analysis framework, SEM)
> ######################################################
> ######################################################
> ######################################################
>
> # specifying path analysis model
> # by using lavann package
> install.packages("lavann")
‘C:/Users/Hyo/Documents/R/win-library/4.1’의 위치에 패키지(들)을 설치합니다.
(왜냐하면 ‘lib’가 지정되지 않았기 때문입니다)
Warning in install.packages :
package ‘lavann’ is not available for this version of R
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
> library(lavaan)
>
> specmod <- "
+ # path c
+ # identifying path c (prime) by putting c*
+ behavior ~ c*attitude
+
+ # path a
+ intention ~ a*attitude
+
+ # path b
+ behavior ~ b*intention
+
+ # indirect effect (a*b): Sobel test (Delta Method)
+ # 간접효과 a path x b path 를 구해서 얻음
+ # sobel test 라 부름
+ ab := a*b
+ "
> # Fit/estimate the model
> fitmod <- sem(specmod, data=df)
> # summarize the model
> summary(fitmod, fit.measures=TRUE, rsquare=T)
lavaan 0.6-9 ended normally after 11 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 5
Number of observations 199
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Model Test Baseline Model:
Test statistic 103.700
Degrees of freedom 3
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -481.884
Loglikelihood unrestricted model (H1) -481.884
Akaike (AIC) 973.767
Bayesian (BIC) 990.234
Sample-size adjusted Bayesian (BIC) 974.393
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value RMSEA <= 0.05 NA
Standardized Root Mean Square Residual:
SRMR 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
behavior ~
attitude (c) 0.029 0.071 0.412 0.680
intention ~
attitude (a) 0.484 0.058 8.333 0.000
behavior ~
intention (b) 0.438 0.075 5.832 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.behavior 0.698 0.070 9.975 0.000
.intention 0.623 0.062 9.975 0.000
R-Square:
Estimate
behavior 0.199
intention 0.259
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
ab 0.212 0.044 4.778 0.000
>
> ##########################################
> # boot strapping instead of sobel test
> ##########################################
> set.seed(101)
> fitmod2 <- sem(specmod, data=df,
+ se="bootstrap",
+ bootstrap=100)
> # bootstrap = 5000 is common
>
> summary(fitmod2, fit.measures=TRUE, rsquare=TRUE)
lavaan 0.6-9 ended normally after 11 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 5
Number of observations 199
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Model Test Baseline Model:
Test statistic 103.700
Degrees of freedom 3
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -481.884
Loglikelihood unrestricted model (H1) -481.884
Akaike (AIC) 973.767
Bayesian (BIC) 990.234
Sample-size adjusted Bayesian (BIC) 974.393
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value RMSEA <= 0.05 NA
Standardized Root Mean Square Residual:
SRMR 0.000
Parameter Estimates:
Standard errors Bootstrap
Number of requested bootstrap draws 100
Number of successful bootstrap draws 100
Regressions:
Estimate Std.Err z-value P(>|z|)
behavior ~
attitude (c) 0.029 0.066 0.446 0.655
intention ~
attitude (a) 0.484 0.051 9.453 0.000
behavior ~
intention (b) 0.438 0.075 5.843 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.behavior 0.698 0.061 11.464 0.000
.intention 0.623 0.061 10.170 0.000
R-Square:
Estimate
behavior 0.199
intention 0.259
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
ab 0.212 0.045 4.690 0.000
> parameterEstimates(fitmod2,
+ ci=TRUE, level=.95,
+ boot.ci.type="perc")
lhs op rhs label est se z pvalue ci.lower ci.upper
1 behavior ~ attitude c 0.029 0.066 0.446 0.655 -0.085 0.182
2 intention ~ attitude a 0.484 0.051 9.453 0.000 0.376 0.588
3 behavior ~ intention b 0.438 0.075 5.843 0.000 0.294 0.612
4 behavior ~~ behavior 0.698 0.061 11.464 0.000 0.561 0.822
5 intention ~~ intention 0.623 0.061 10.170 0.000 0.492 0.724
6 attitude ~~ attitude 0.928 0.000 NA NA 0.928 0.928
7 ab := a*b ab 0.212 0.045 4.690 0.000 0.131 0.319
>
질문. 위에서 rsquare 값은 무엇을 의미하나요?
- 우선 지정모델을 보면 (specmod) 아래처럼 세개의 리그레션을 지정한 것을 알 수 있습니다.
# 편의상 코멘트 지움
specmod <- "
behavior ~ c*attitude
behavior ~ b*intention
intention ~ a*attitude
ab := a*b
"
- 이 중에서 종속변인을 골라내자면
- behavior ~ attitude + intention 에서 나타나는 r square 값
- intention ~ attitude 에서 나타나는 intention에서의 r square 값이 있을 수 있습니다.
- 위의 둘은 poking the above 라는 아래 문서에서
lm.ba.01 <- lm(behavior~attitude+intention, data=df) # all lm.ba.02 <- lm(behavior~intention, data=df) # b path lm.ba.03 <- lm(intention~attitude, data=df) # a path lm.ba.04 <- lm(attitude~intention, data=df) # reverse a path lm.ba.05 <- lm(behavior~attitude, data=df) # c prime path
- 위에서 lm.ba.01 과 lm.ba.03에 해당하는 regression이라고 하겠습니다.
- 아래는 이를 출력해보는 명령어입니다
> summary(lm.ba.01)$r.square [1] 0.1989125 > summary(lm.ba.03)$r.square [1] 0.2586768
- 바로 위의 값과 같은 값이라고 보면 되겠습니다.
질문. mediation effect는 왜 두 path를 곱하나요?
-- Intention --
a -- -- b
-- --
Attitudes --------c'--------- Behavior
위에서 a, b는 beta coefficients라고 가정하고, 이 값들이 각각 a = 2, b = 1.5 라고 가정합니다. 이 때,
- a는 (2) attitudes의 measurement 한 unit이 증가할 때 intention이 2 증가한다는 것을 의미합니다.
- b는 (1.5)는 attitudes의 점수가 하나 증가할 때 마다 behavior가 2*1.5 증가함을 (3) 의미합니다. 즉, attitudes가 한 단위 증가할 때마다 beahvior는 3 증가합니다. 독립변인 attitudes의 intention을 매개로 한 영향력을 말할 때 이 3을 사용합니다. 따라서 ab (mediation effects) = a * b 로 생각할 수 있습니다.
Poking the above
# poking 둘러보기 # 모델 = # a Intention b # Attitude c Behavior # lm.ba.01 <- lm(behavior~attitude+intention, data=df) # all lm.ba.02 <- lm(behavior~intention, data=df) # b path lm.ba.03 <- lm(intention~attitude, data=df) # a path lm.ba.04 <- lm(attitude~intention, data=df) # reverse a path lm.ba.05 <- lm(behavior~attitude, data=df) # c prime path
위에서
summary(lm.ba.05) summary(lm.ba.01)
> summary(lm.ba.05)
Call:
lm(formula = behavior ~ attitude, data = df)
Residuals:
Min 1Q Median 3Q Max
-1.9679 -0.6291 0.0882 0.6625 1.8128
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3355 0.2221 10.51 < 2e-16 ***
attitude 0.2413 0.0669 3.61 0.00039 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.909 on 197 degrees of freedom
Multiple R-squared: 0.062, Adjusted R-squared: 0.0572
F-statistic: 13 on 1 and 197 DF, p-value: 0.000392
> summary(lm.ba.01)
Call:
lm(formula = behavior ~ attitude + intention, data = df)
Residuals:
Min 1Q Median 3Q Max
-2.0192 -0.5728 0.0633 0.6373 1.7381
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.6962 0.2336 7.26 8.8e-12 ***
attitude 0.0294 0.0720 0.41 0.68
intention 0.4377 0.0756 5.79 2.8e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.842 on 196 degrees of freedom
Multiple R-squared: 0.199, Adjusted R-squared: 0.191
F-statistic: 24.3 on 2 and 196 DF, p-value: 3.64e-10
위 둘에서 attitude의 t test 결과를 비교해 보면
- lm.ba.05 에서는 attitude 가 behavior 를 설명하는 것으로 나타남
- 그러나, lm.ba.01 에서 attitude 와 intention 을 동시에 넣었을 때에는
- attitude 의 설명력은 의미가 없어지고 intention 만이 설명력을 보임
참고. 아래는 lm 으로 얻은 리그레션 결과가 가지는 데이터들로 residuals 은 리그레션 라인의 예측치와 실제관측치의 차이인 residual 값들을 저장한 데이터를 말한다.
> names(lm.ba.03) [1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "xlevels" "call" "terms" "model" >
- 또한 fitted.values 는 리그레션 라인의 예측치를 말하므로 이 예측치에서 Y 변인의 (intention) 평균값을 뺀 가치는 explained 혹은 regression 혹은 극복된 error에 해당하는 값이 된다. 이를 reg.int에 저장한 것이다.
- 또한 res.int는 residual 값을 저장한 것이다.
- 이 둘의 각각 제곱의 합은 SS regression과 SS residual이 된다.
- 그러므로 이 둘을 합한 값은 intention 변인의 각 개인 값에서 평균값을 제외한 값이 된다. 만약에 intention값을 그대로 (똑같이) 복원하고자 한다면
- reg.reg + res.int + mean(df$intention) 을 이용하면 될 것이다.
reg.int <- lm.ba.03$fitted.values - mean(df$intention) res.int <- summary(lm.ba.03)$residuals
- 또한 SS reg + SS res = SS y = SS total = SS intention 임을 알고 있다.
# just checking sum(reg.int^2) + sum(res.int^2) var(df$intention)*(length(df$intention)-1)
아래는 behavior를 종속변인으로 하고 reg.int를 독립변인으로 regression을 한 결과이다. 이 결과는 intention 의 SS 중에서 attitude의 설명력 부분이 (regression 부분) behavior에 어떻게 설명이 되는가를 보기 위한 것이다.
# the intention part contributed by attitudes # is it explaing behavior too? lm.ba.021 <- lm(behavior~reg.int, data=df) summary(lm.ba.021)
반대로 아래는 res.int는 attitude의 설명력을 제외한 나머지이고 이것이 behavior에 어떻게 영향을 주는지 보는 것이다. 이는 attitude의 설명력이 “mediated”되지 않고 (attitude의 설명력을 제외한) 순수 intention이 behavior에 영향을 주는 것을 보는 부분이다.
# the pure intention part excluding # what attitude contributes lm.ba.022 <- lm(behavior~res.int, data=df) summary(lm.ba.022)
- 또한 당연한 이야기지만 int의 residual과 regression 파트를 더해서 behavior에 regression을 해보면 intention으로 regression을 한 것과 (b path) 같은 결과를 보여줄 것이다.
int.all <- res.int + reg.int lm.temp <- lm(behavior~int.all, data=df) summary(lm.temp) summary(lm.ba.02)
그러니,
- lm.ba.021는 (reg.int로 behavior에 regression을 한 결과) attitude의 영향력이 mediated 된 intention 부분의 설명력을 보여주고
- lm.ba.022는 (res.int로 behavior에 regression을 한 결과) attitude의 영향력을 제외한 intention 부분의 설명력을 보여주는 것이라고 볼 수 있다.
아래는 이를 다시 한번 출력해 본 것이다
summary(lm.ba.021) summary(lm.ba.022)
혹은 아래와 같이 살펴볼 수도 있다.
# 아래는 attitude - intention - behavior 의 # 영향력의 경로를 도식화 준다. # intention - behavior part summary(lm.ba.02)$r.squared # K - attitudes 가 intention을 설명해 주는 부분 (regression error) summary(lm.ba.021)$r.squared # J - attitudes 가 설명하지 못하는 부분 (residual error) summary(lm.ba.022)$r.squared # 위에서 intention은 K와 J로 이루어져 있다. 이를 확인하는 것 summary(lm.ba.021)$r.squared + summary(lm.ba.022)$r.squared
그렇다면 attitude가 intention을 통해서 mediated된 부분의 설명력을 제외한 나머지는 얼마가 될까라고 생각을 하면 이는 lm.ba.04 에서의 (reversed a path) residual 이 behavior에 얼마나 영향을 주는가를 보는 것과 같을 것이다. 이를 계산해 보면
# lm.ba.04 <- lm(attitude~intention, data=df) # reverse a path res.temp <- lm.ba.04$residuals # res.temp는 attitude 중에서 intention의 설명력을 제외한 (관계있는 부분을 제외한) 것을 의미 # 이것으로 behavior에 regression을 한다는 것은 attitude가 intention을 타고 넘어가는 (매개로 # 하는) 영향력을 제외한 부분만을 가지고 behavior에 regression을 한다는 것으로 이해할 수 있다 lm.temp <- lm(behavior~res.temp, data=df) summary(lm.temp) summary(lm.ba.01)
위에서
attitude의 regression coeffcient 값이 같음을 알 수 있다 (b = 0.02941).
> res.temp <- lm.ba.04$residuals
> lm.temp <- lm(behavior~res.temp, data=df)
> summary(lm.temp)
Call:
lm(formula = behavior ~ res.temp, data = df)
Residuals:
Min 1Q Median 3Q Max
-2.1138 -0.6154 0.0725 0.6955 1.9098
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.1025 0.0665 46.66 <2e-16 ***
res.temp 0.0294 0.0802 0.37 0.71
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.938 on 197 degrees of freedom
Multiple R-squared: 0.000683, Adjusted R-squared: -0.00439
F-statistic: 0.135 on 1 and 197 DF, p-value: 0.714
> summary(lm.ba.01)
Call:
lm(formula = behavior ~ attitude + intention, data = df)
Residuals:
Min 1Q Median 3Q Max
-2.0192 -0.5728 0.0633 0.6373 1.7381
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.6962 0.2336 7.26 8.8e-12 ***
attitude 0.0294 0.0720 0.41 0.68
intention 0.4377 0.0756 5.79 2.8e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.842 on 196 degrees of freedom
Multiple R-squared: 0.199, Adjusted R-squared: 0.191
F-statistic: 24.3 on 2 and 196 DF, p-value: 3.64e-10
Another modeling
성재학생이 다른 아이디어를 가지고 있다. 이를 분석에서 구현보고자 한다.
library(lavaan)
specmod <- "
# regression test
behavior ~ c*attitude
+ m*control
+ n*norms
intention ~ a*attitude
+ k*norms
+ j*control
behavior ~ b*intention
# indirect effect (a*b): Sobel test (Delta Method)
ab := a*b
kb := k*b
jb := j*b
"
# Fit/estimate the model
fitmod <- sem(specmod, data=df)
# summarize the model
summary(fitmod, fit.measures=TRUE, rsquare=T)
Another moding output
> library(lavaan)
>
> specmod <- "
+ # path c
+ # identifying path c (prime) by putting c*
+ behavior ~ c*attitude
+ + m*control
+ + n*norms
+
+ # path a
+ intention ~ a*attitude
+ + k*norms
+ + j*control
+
+ # path b
+ behavior ~ b*intention
+
+ # indirect effect (a*b): Sobel test (Delta Method)
+ # 간접효과 a path x b path 를 구해서 얻음
+ # sobel test 라 부름
+ ab := a*b
+ kb := k*b
+ jb := j*b
+
+ # attitude ~~ norms + control
+ # norms ~~ control
+ "
> # Fit/estimate the model
> fitmod <- sem(specmod, data=df)
> # summarize the model
> summary(fitmod, fit.measures=TRUE, rsquare=T)
lavaan 0.6-12 ended normally after 1 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 9
Number of observations 199
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Model Test Baseline Model:
Test statistic 137.622
Degrees of freedom 7
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -464.922
Loglikelihood unrestricted model (H1) -464.922
Akaike (AIC) 947.845
Bayesian (BIC) 977.484
Sample-size adjusted Bayesian (BIC) 948.972
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value RMSEA <= 0.05 NA
Standardized Root Mean Square Residual:
SRMR 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
behavior ~
attitude (c) 0.041 0.072 0.563 0.573
control (m) -0.090 0.070 -1.285 0.199
norms (n) 0.042 0.069 0.605 0.545
intention ~
attitude (a) 0.352 0.058 6.068 0.000
norms (k) 0.153 0.059 2.577 0.010
control (j) 0.275 0.058 4.740 0.000
behavior ~
intention (b) 0.463 0.081 5.715 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.behavior 0.692 0.069 9.975 0.000
.intention 0.530 0.053 9.975 0.000
R-Square:
Estimate
behavior 0.206
intention 0.369
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
ab 0.163 0.039 4.160 0.000
kb 0.071 0.030 2.349 0.019
jb 0.127 0.035 3.648 0.000
>
What about this output
library(lavaan)
specmod <- "
# path c
# identifying path c (prime) by putting c*
# path a
intention ~ a*attitude
+ k*norms
+ j*control
# path b
behavior ~ b*intention
# indirect effect (a*b): Sobel test (Delta Method)
# 간접효과 a path x b path 를 구해서 얻음
# sobel test 라 부름
ab := a*b
kb := k*b
jb := j*b
# attitude ~~ norms + control
# norms ~~ control
"
# Fit/estimate the model
fitmod <- cfa(specmod, data=df)
# summarize the model
summary(fitmod, fit.measures=TRUE, rsquare=T)
아래 아웃풋의 . . . path analysis 참조
chi-square test p-value는 어떤가?
CFI, TLI는?
그 외의 지수는?
어떤 모델이 지금의 현상을 가장 잘 설명하는다고 판단하는가?
Output
library(lavaan)
>
> specmod <- "
+ # path c
+ # identifying path c (prime) by putting c*
+
+ # path a
+ intention ~ a*attitude
+ + k*norms
+ + j*control
+
+ # path b
+ behavior ~ b*intention
+
+ # indirect effect (a*b): Sobel test (Delta Method)
+ # 간접효과 a path x b path 를 구해서 얻음
+ # sobel test 라 부름
+ ab := a*b
+ kb := k*b
+ jb := j*b
+
+ # attitude ~~ norms + control
+ # norms ~~ control
+ "
> # Fit/estimate the model
> fitmod <- cfa(specmod, data=df)
> # summarize the model
> summary(fitmod, fit.measures=TRUE, rsquare=T)
lavaan 0.6-12 ended normally after 1 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 6
Number of observations 199
Model Test User Model:
Test statistic 2.023
Degrees of freedom 3
P-value (Chi-square) 0.568
Model Test Baseline Model:
Test statistic 137.622
Degrees of freedom 7
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.017
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -465.934
Loglikelihood unrestricted model (H1) -464.922
Akaike (AIC) 943.868
Bayesian (BIC) 963.628
Sample-size adjusted Bayesian (BIC) 944.619
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.103
P-value RMSEA <= 0.05 0.735
Standardized Root Mean Square Residual:
SRMR 0.019
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
intention ~
attitude (a) 0.352 0.058 6.068 0.000
norms (k) 0.153 0.059 2.577 0.010
control (j) 0.275 0.058 4.740 0.000
behavior ~
intention (b) 0.453 0.065 7.014 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.intention 0.530 0.053 9.975 0.000
.behavior 0.699 0.070 9.975 0.000
R-Square:
Estimate
intention 0.369
behavior 0.198
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
ab 0.160 0.035 4.589 0.000
kb 0.069 0.029 2.419 0.016
jb 0.125 0.032 3.927 0.000
e.gs
# install.packages("ISLR")
library(ISLR)
attach(mtcars)
# x <- disp (배기량)
# y <- mpg (아일리지)
# m <- wt (무게)
specmod <- "
# path c
# identifying path c (prime) by putting c*
mpg ~ c*disp
# path a
wt ~ a*disp
# path b
mpg ~ b*wt
# indirect effect (a*b): Sobel test (Delta Method)
# 간접효과 a path x b path 를 구해서 얻음
# sobel test 라 부름
ab := a*b
"
# Fit/estimate the model
fitmod <- sem(specmod, data=mtcars)
summary(fitmod)
set.seed(101)
fitmod2 <- sem(specmod, data=mtcars,
se="bootstrap",
bootstrap=200)
# bootstrap = 5000 is common
summary(fitmod2, fit.measures=TRUE, rsquare=TRUE)
parameterEstimates(fitmod2,
ci=TRUE, level=.95,
boot.ci.type="perc")
install.packages("ISLR")
library(ISLR)
attach(Carseats)
mod.total <- lm(Sales~Price, data=Carseats)
summary(mod.total)
mod.m <- lm(CompPrice~Price, data=Carseats)
summary(mod.m)
mod.y <- lm(Sales~Price+CompPrice, data=Carseats)
summary(mod.y)
# install.packages("multilevel")
library(multilevel)
sob.m <- sobel(pred = Price, med = CompPrice, out = Sales)
sob.m
pnorm(abs(sob.m$z.value), lower.tail = FALSE) * 2
install.packages("bda")
library(bda)
mediation.test(mv=CompPrice, iv=Price, dv=Sales)
install.packages("nloptr")
library(nloptr)
install.packages("mediation")
library(mediation)
set.seed(101)
mod.mediation <- mediate(model.m=mod.m,
model.y = mod.y,
treat="Price",
mediator="CompPrice",
boot=TRUE, sims=500)
summary(mod.mediation)
# Total Effect =
# ADE Average Direct Effect = 직접효과
# ACME Average Causal Mediation Effect = 간접효과
plot(mod.mediation)
e.g. 2
https://advstats.psychstat.org/book/mediation/index.php
####################################
nlsy <- read.csv("http://commres.net/wiki/_media/r/nlsy.csv")
attach(nlsy)
# install.packages("bmem")
library(bmem)
library(sem)
# install.packages("cfa")
library(cfa)
##########################
nlsy.model<-specifyEquations(exog.variances=T)
math =b*HE + cp*ME
HE = a*ME
<ENTER>
effects<-c('a*b', 'cp+a*b')
nlsy.res<-bmem.sobel(nlsy, nlsy.model,effects)
##########################
m.me.he <- lm(ME~HE) m.math.me <- lm(math~ME) m.math.he <- lm(math~HE) m.math.mehe <- lm(math~ME+HE) m.he.me <- lm(HE~ME) summary(m.he.me) res.m.he.me <- resid(m.he.me) m.temp <- lm(math~res.m.he.me) summary(m.temp) res.m.me.he <- resid(m.me.he) m.temp2 <- lm(math~res.m.me.he) summary(m.temp2) library(lavaan) specmod <- " # path c' (direct effect) math ~ c*ME # path a HE ~ a*ME # path b math ~ b*HE # indirect effect (a*b) # sobel test (Delta method) ab := a*b " # fit/estimate model fitmod <- sem(specmod, data=df) # summarize/result the output summary(fitmod, fit.measures=TRUE, rsquare=TRUE) # for a summary(m.he.me) # for b summary(m.temp) # for cprime summary(m.temp2) a <- summary(m.he.me)$coefficient[2] # a b <- summary(m.temp)$coefficient[2] # b c <- summary(m.temp2)$coefficient[2] # c a b c a*b c2 <- summary(fitmod)$pe$est[1] a2 <- summary(fitmod)$pe$est[2] b2 <- summary(fitmod)$pe$est[3] ab2 <- summary(fitmod)$pe$est[7] a2 b2 c2 ab2
> ####################################
> nlsy <- read.csv("http://commres.net/wiki/_media/r/nlsy.csv")
> attach(nlsy)
The following objects are masked from nlsy (pos = 3):
HE, math, ME
The following objects are masked from nlsy (pos = 5):
HE, math, ME
The following objects are masked from df:
HE, math, ME
> # install.packages("bmem")
> library(bmem)
> library(sem)
> # install.packages("cfa")
> library(cfa)
>
> nlsy.model<-specifyEquations(exog.variances=T)
1: math =b*HE + cp*ME
2: HE = a*ME
3:
Read 2 items
NOTE: adding 3 variances to the model
> effects<-c('a*b', 'cp+a*b')
> nlsy.res<-bmem.sobel(nlsy, nlsy.model,effects)
Estimate S.E. z-score p.value
b 0.46450283 0.14304860 3.247168 1.165596e-03
cp 0.46281480 0.11977862 3.863918 1.115825e-04
a 0.13925694 0.04292438 3.244239 1.177650e-03
V[HE] 2.73092635 0.20078170 13.601471 0.000000e+00
V[math] 20.67659134 1.52017323 13.601471 0.000000e+00
V[ME] 4.00590078 0.29451968 13.601471 0.000000e+00
a*b 0.06468524 0.02818457 2.295059 2.172977e-02
cp+a*b 0.52750005 0.11978162 4.403848 1.063474e-05
>
> m.me.he <- lm(ME~HE)
> m.math.me <- lm(math~ME)
> m.math.he <- lm(math~HE)
> m.math.mehe <- lm(math~ME+HE)
> m.he.me <- lm(HE~ME)
> summary(m.he.me)
Call:
lm(formula = HE ~ ME)
Residuals:
Min 1Q Median 3Q Max
-5.5020 -0.7805 0.2195 1.2195 3.3587
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.10944 0.49127 8.365 1.25e-15 ***
ME 0.13926 0.04298 3.240 0.0013 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.655 on 369 degrees of freedom
Multiple R-squared: 0.02766, Adjusted R-squared: 0.02502
F-statistic: 10.5 on 1 and 369 DF, p-value: 0.001305
> res.m.he.me <- resid(m.he.me)
> m.temp <- lm(math~res.m.he.me)
> summary(m.temp)
Call:
lm(formula = math ~ res.m.he.me)
Residuals:
Min 1Q Median 3Q Max
-12.4263 -3.1496 -0.3499 2.2826 29.7795
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.1186 0.2427 49.936 < 2e-16 ***
res.m.he.me 0.4645 0.1471 3.159 0.00172 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.674 on 369 degrees of freedom
Multiple R-squared: 0.02633, Adjusted R-squared: 0.02369
F-statistic: 9.978 on 1 and 369 DF, p-value: 0.001715
> res.m.me.he <- resid(m.me.he)
> m.temp2 <- lm(math~res.m.me.he)
> summary(m.temp2)
Call:
lm(formula = math ~ res.m.me.he)
Residuals:
Min 1Q Median 3Q Max
-14.1938 -2.7965 -0.3425 2.4081 29.5656
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.1186 0.2413 50.22 < 2e-16 ***
res.m.me.he 0.4628 0.1224 3.78 0.000183 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.648 on 369 degrees of freedom
Multiple R-squared: 0.03728, Adjusted R-squared: 0.03467
F-statistic: 14.29 on 1 and 369 DF, p-value: 0.0001828
>
> library(lavaan)
> specmod <- "
+ # path c' (direct effect)
+ math ~ c*ME
+
+ # path a
+ HE ~ a*ME
+
+ # path b
+ math ~ b*HE
+
+ # indirect effect (a*b)
+ # sobel test (Delta method)
+ ab := a*b
+ "
>
> # fit/estimate model
> fitmod <- sem(specmod, data=df)
>
> # summarize/result the output
> summary(fitmod, fit.measures=TRUE, rsquare=TRUE)
lavaan 0.6-12 ended normally after 1 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 5
Number of observations 371
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Model Test Baseline Model:
Test statistic 39.785
Degrees of freedom 3
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -1800.092
Loglikelihood unrestricted model (H1) -1800.092
Akaike (AIC) 3610.184
Bayesian (BIC) 3629.765
Sample-size adjusted Bayesian (BIC) 3613.901
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value RMSEA <= 0.05 NA
Standardized Root Mean Square Residual:
SRMR 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
math ~
ME (c) 0.463 0.120 3.869 0.000
HE ~
ME (a) 0.139 0.043 3.249 0.001
math ~
HE (b) 0.465 0.143 3.252 0.001
Variances:
Estimate Std.Err z-value P(>|z|)
.math 20.621 1.514 13.620 0.000
.HE 2.724 0.200 13.620 0.000
R-Square:
Estimate
math 0.076
HE 0.028
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
ab 0.065 0.028 2.298 0.022
>
> # for a
> summary(m.he.me)
Call:
lm(formula = HE ~ ME)
Residuals:
Min 1Q Median 3Q Max
-5.5020 -0.7805 0.2195 1.2195 3.3587
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.10944 0.49127 8.365 1.25e-15 ***
ME 0.13926 0.04298 3.240 0.0013 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.655 on 369 degrees of freedom
Multiple R-squared: 0.02766, Adjusted R-squared: 0.02502
F-statistic: 10.5 on 1 and 369 DF, p-value: 0.001305
> # for b
> summary(m.temp)
Call:
lm(formula = math ~ res.m.he.me)
Residuals:
Min 1Q Median 3Q Max
-12.4263 -3.1496 -0.3499 2.2826 29.7795
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.1186 0.2427 49.936 < 2e-16 ***
res.m.he.me 0.4645 0.1471 3.159 0.00172 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.674 on 369 degrees of freedom
Multiple R-squared: 0.02633, Adjusted R-squared: 0.02369
F-statistic: 9.978 on 1 and 369 DF, p-value: 0.001715
> # for cprime
> summary(m.temp2)
Call:
lm(formula = math ~ res.m.me.he)
Residuals:
Min 1Q Median 3Q Max
-14.1938 -2.7965 -0.3425 2.4081 29.5656
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.1186 0.2413 50.22 < 2e-16 ***
res.m.me.he 0.4628 0.1224 3.78 0.000183 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.648 on 369 degrees of freedom
Multiple R-squared: 0.03728, Adjusted R-squared: 0.03467
F-statistic: 14.29 on 1 and 369 DF, p-value: 0.0001828
>
> a <- summary(m.he.me)$coefficient[2] # a
> b <- summary(m.temp)$coefficient[2] # b
> c <- summary(m.temp2)$coefficient[2] # c
> a
[1] 0.1392569
> b
[1] 0.4645028
> c
[1] 0.4628148
> a*b
[1] 0.06468524
>
> c2 <- summary(fitmod)$pe$est[1]
> a2 <- summary(fitmod)$pe$est[2]
> b2 <- summary(fitmod)$pe$est[3]
> ab2 <- summary(fitmod)$pe$est[7]
> a2
[1] 0.1392569
> b2
[1] 0.4645028
> c2
[1] 0.4628148
> ab2
[1] 0.06468524
>
temp
tests <- read.csv("http://commres.net/wiki/_media/r/tests_cor.csv")
colnames(tests) <- c("ser", "sat", "clep", "gpa")
tests <- subset(tests, select=c("sat", "clep", "gpa"))
attach(tests)
lm.gpa.clepsat <- lm(gpa ~ clep + sat, data = tests) summary(lm.gpa.clepsat) lm.gpa.clep <- lm(gpa ~ clep) lm.gpa.sat <- lm(gpa ~ sat) summary(lm.gpa.clep) summary(lm.gpa.sat)
> lm.gpa.clepsat <- lm(gpa ~ clep + sat, data = tests)
> summary(lm.gpa.clepsat)
Call:
lm(formula = gpa ~ clep + sat, data = tests)
Residuals:
Min 1Q Median 3Q Max
-0.197888 -0.128974 -0.000528 0.131170 0.226404
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.1607560 0.4081117 2.844 0.0249 *
clep 0.0729294 0.0253799 2.874 0.0239 *
sat -0.0007015 0.0012564 -0.558 0.5940
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1713 on 7 degrees of freedom
Multiple R-squared: 0.7778, Adjusted R-squared: 0.7143
F-statistic: 12.25 on 2 and 7 DF, p-value: 0.005175
>
0.7778 이 두 변인 clep 과 sat 가 gpa를 설명하는 부분
summary(lm.gpa.clepsat)$r.squared = 0.778
그리고, 위에서 sat의 설명력은 significant하지 않음
그럼 sat만으로 gpa를 보면?
> lm.gpa.clep <- lm(gpa ~ clep)
> lm.gpa.sat <- lm(gpa ~ sat)
> summary(lm.gpa.clep)
Call:
lm(formula = gpa ~ clep)
Residuals:
Min 1Q Median 3Q Max
-0.190496 -0.141167 -0.002376 0.110847 0.225207
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.17438 0.38946 3.015 0.016676 *
clep 0.06054 0.01177 5.144 0.000881 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1637 on 8 degrees of freedom
Multiple R-squared: 0.7679, Adjusted R-squared: 0.7388
F-statistic: 26.46 on 1 and 8 DF, p-value: 0.0008808
> summary(lm.gpa.sat)
Call:
lm(formula = gpa ~ sat)
Residuals:
Min 1Q Median 3Q Max
-0.23544 -0.12184 0.00316 0.02943 0.56456
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.7848101 0.4771715 3.740 0.0057 **
sat 0.0024557 0.0008416 2.918 0.0193 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2365 on 8 degrees of freedom
Multiple R-squared: 0.5156, Adjusted R-squared: 0.455
F-statistic: 8.515 on 1 and 8 DF, p-value: 0.01935
>
위에서처럼 significant함.
summary(lm.gpa.clep)$r.squared = 0.7679
summary(lm.gpa.sat)$r.squared = 0.5156
그렇다면 sat의 영향력은 clep을 매개로 해서 나타나는가를 보기 위해서
> lm.clep.sat <- lm(clep ~ sat)
> summary(lm.clep.sat)
Call:
lm(formula = clep ~ sat)
Residuals:
Min 1Q Median 3Q Max
-2.5316 -1.2437 -0.2848 0.0949 5.6329
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.556962 4.813367 1.778 0.11334
sat 0.043291 0.008489 5.100 0.00093 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.386 on 8 degrees of freedom
Multiple R-squared: 0.7648, Adjusted R-squared: 0.7353
F-statistic: 26.01 on 1 and 8 DF, p-value: 0.0009303
>
res.lm.clep.sat <- resid(lm.clep.sat) reg.lm.clep.sat <- predict(lm.clep.sat)-mean(clep) lm.gpa.sat.mediated.via.clep <- lm(gpa~reg.lm.clep.sat) lm.gpa.clep.alone <- lm(gpa~res.lm.clep.sat) summary(lm.gpa.sat.mediated.via.clep) summary(lm.gpa.clep.alone)
> res.lm.clep.sat <- resid(lm.clep.sat)
> reg.lm.clep.sat <- predict(lm.clep.sat)-mean(clep)
> lm.gpa.sat.mediated.via.clep <- lm(gpa~reg.lm.clep.sat)
> lm.gpa.clep.alone <- lm(gpa~res.lm.clep.sat)
>
> summary(lm.gpa.sat.mediated.via.clep)
Call:
lm(formula = gpa ~ reg.lm.clep.sat)
Residuals:
Min 1Q Median 3Q Max
-0.23544 -0.12184 0.00316 0.02943 0.56456
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.16000 0.07480 42.246 1.09e-10 ***
reg.lm.clep.sat 0.05673 0.01944 2.918 0.0193 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2365 on 8 degrees of freedom
Multiple R-squared: 0.5156, Adjusted R-squared: 0.455
F-statistic: 8.515 on 1 and 8 DF, p-value: 0.01935
> summary(lm.gpa.clep.alone)
Call:
lm(formula = gpa ~ res.lm.clep.sat)
Residuals:
Min 1Q Median 3Q Max
-0.34523 -0.23300 -0.04416 0.27577 0.36370
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.16000 0.09231 34.231 5.8e-10 ***
res.lm.clep.sat 0.07293 0.04326 1.686 0.13
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2919 on 8 degrees of freedom
Multiple R-squared: 0.2622, Adjusted R-squared: 0.1699
F-statistic: 2.842 on 1 and 8 DF, p-value: 0.1303
>
sat의 영향력 mediated via clep = Multiple R-squared: 0.5156
sat의 영향력을 제외한 celp만의 설명력 = Multiple R-squared: 0.2622
temp2
> #############################
> #############################
> # https://data.library.virginia.edu/introduction-to-mediation-analysis/
> # Download data online. This is a simulated dataset for this post.
> myData <- read.csv('http://static.lib.virginia.edu/statlab/materials/data/mediationData.csv')
>
> model.0 <- lm(Y ~ X, myData)
> summary(model.0)
Call:
lm(formula = Y ~ X, data = myData)
Residuals:
Min 1Q Median 3Q Max
-5.0262 -1.2340 -0.3282 1.5583 5.1622
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.8572 0.6932 4.122 7.88e-05 ***
X 0.3961 0.1112 3.564 0.000567 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.929 on 98 degrees of freedom
Multiple R-squared: 0.1147, Adjusted R-squared: 0.1057
F-statistic: 12.7 on 1 and 98 DF, p-value: 0.0005671
> # c line의 coef 값 = 0.396 at p-level = 0.0006
>
> model.M <- lm(M ~ X, myData)
> summary(model.M)
Call:
lm(formula = M ~ X, data = myData)
Residuals:
Min 1Q Median 3Q Max
-4.3046 -0.8656 0.1344 1.1344 4.6954
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.49952 0.58920 2.545 0.0125 *
X 0.56102 0.09448 5.938 4.39e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.639 on 98 degrees of freedom
Multiple R-squared: 0.2646, Adjusted R-squared: 0.2571
F-statistic: 35.26 on 1 and 98 DF, p-value: 4.391e-08
> # a line = 0.561 p = 0.0
>
> model.Y <- lm(Y ~ X + M, myData)
> summary(model.Y)
Call:
lm(formula = Y ~ X + M, data = myData)
Residuals:
Min 1Q Median 3Q Max
-3.7631 -1.2393 0.0308 1.0832 4.0055
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.9043 0.6055 3.145 0.0022 **
X 0.0396 0.1096 0.361 0.7187
M 0.6355 0.1005 6.321 7.92e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.631 on 97 degrees of freedom
Multiple R-squared: 0.373, Adjusted R-squared: 0.3601
F-statistic: 28.85 on 2 and 97 DF, p-value: 1.471e-10
> #################################################
> ## b line + c' line (c에서 M이 추가되었으므로)
> ## c' = 0.0396 ---
> ## b = 0.6355 ***
> #################################################
>
> library(mediation)
> results <- mediate(model.M, model.Y,
+ treat='X',
+ mediator='M',
+ boot=TRUE, sims=500)
Running nonparametric bootstrap
> summary(results)
Causal Mediation Analysis
Nonparametric Bootstrap Confidence Intervals with the Percentile Method
Estimate 95% CI Lower 95% CI Upper p-value
ACME 0.3565 0.2174 0.54 <2e-16 ***
ADE 0.0396 -0.1712 0.31 0.74
Total Effect 0.3961 0.1890 0.64 <2e-16 ***
Prop. Mediated 0.9000 0.4707 1.69 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Sample Size Used: 100
Simulations: 500
>
> # OR with lavaan
> library(lavaan)
> specmod <- "
+ # path c' (direct effect)
+ Y ~ c*X
+
+ # path a
+ M ~ a*X
+
+ # path b
+ Y ~ b*M
+
+ # indirect effect (a*b)
+ # sobel test (Delta method)
+ ab := a*b
+ "
>
> # fit/estimate model
> fitmod <- sem(specmod, data=myData)
>
> # summarize/result the output
> summary(fitmod, fit.measures=TRUE, rsquare=TRUE)
lavaan 0.6-12 ended normally after 1 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 5
Number of observations 100
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Model Test Baseline Model:
Test statistic 77.413
Degrees of freedom 3
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -379.612
Loglikelihood unrestricted model (H1) -379.612
Akaike (AIC) 769.225
Bayesian (BIC) 782.250
Sample-size adjusted Bayesian (BIC) 766.459
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value RMSEA <= 0.05 NA
Standardized Root Mean Square Residual:
SRMR 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|)
Y ~
X (c) 0.040 0.108 0.367 0.714
M ~
X (a) 0.561 0.094 5.998 0.000
Y ~
M (b) 0.635 0.099 6.418 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.Y 2.581 0.365 7.071 0.000
.M 2.633 0.372 7.071 0.000
R-Square:
Estimate
Y 0.373
M 0.265
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
ab 0.357 0.081 4.382 0.000
>
> # Resampling method
> set.seed(2019)
>
> fitmod <- sem(specmod, data=myData, se="bootstrap", bootstrap=1000)
> summary(fitmod, fit.measures=TRUE, rsquare=TRUE)
lavaan 0.6-12 ended normally after 1 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 5
Number of observations 100
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Model Test Baseline Model:
Test statistic 77.413
Degrees of freedom 3
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.000
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -379.612
Loglikelihood unrestricted model (H1) -379.612
Akaike (AIC) 769.225
Bayesian (BIC) 782.250
Sample-size adjusted Bayesian (BIC) 766.459
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value RMSEA <= 0.05 NA
Standardized Root Mean Square Residual:
SRMR 0.000
Parameter Estimates:
Standard errors Bootstrap
Number of requested bootstrap draws 1000
Number of successful bootstrap draws 1000
Regressions:
Estimate Std.Err z-value P(>|z|)
Y ~
X (c) 0.040 0.124 0.319 0.750
M ~
X (a) 0.561 0.095 5.888 0.000
Y ~
M (b) 0.635 0.102 6.224 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.Y 2.581 0.334 7.731 0.000
.M 2.633 0.362 7.271 0.000
R-Square:
Estimate
Y 0.373
M 0.265
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
ab 0.357 0.080 4.442 0.000
> parameterEstimates(fitmod, ci=TRUE, level=.95, boot.ci.type="perc")
lhs op rhs label est se z pvalue ci.lower ci.upper
1 Y ~ X c 0.040 0.124 0.319 0.75 -0.200 0.288
2 M ~ X a 0.561 0.095 5.888 0.00 0.391 0.765
3 Y ~ M b 0.635 0.102 6.224 0.00 0.439 0.835
4 Y ~~ Y 2.581 0.334 7.731 0.00 1.868 3.193
5 M ~~ M 2.633 0.362 7.271 0.00 1.890 3.311
6 X ~~ X 3.010 0.000 NA NA 3.010 3.010
7 ab := a*b ab 0.357 0.080 4.442 0.00 0.225 0.538
>
> summary(lm(Y~X+M,data=myData))$r.square
[1] 0.3729992
> summary(lm(M~X,data=myData))$r.square
[1] 0.2645879
> var(myData$Y)
[1] 4.158687
> var(myData$M)
[1] 3.616566
> var(myData$X)
[1] 3.040303
>
