Table of Contents

Path Analysis

Introduction

Regressions vs. Path Analysis (or SEM)

Model Identification



out of possible 15 relationships
15 - 12 =3 (df)

$\chi^2$ $\text{CFI}$ $\text{TLI}$ $\text{RMSEA}$ $\text{SRMR}$
$p \ge .05$ $p \ge .90$ $p \ge .95$ $p \le .08$ $p \le .08$

Then what is SEM (Structural Equation Modeling)

E.g. in R

######################################################
## data file: PlannedBehavior.csv
######################################################
######################################################
install.packages("readr")
library(readr)
df <- read.csv("http://commres.net/wiki/_media/r/plannedbehavior.csv")
head(df)
str(df)

# path analysis in R using lavaan package
# install.packages("lavaan")
library(lavaan)

# Model speficiation
specmod <- "
    intention ~ attitude + norms + control
"
# Estimate model 
fitmod <- sem(specmod, data=df)

# summarize the result
summary(fitmod, fit.measures=TRUE, rsquare=TRUE)

specmod2

# Model speficiation 2
specmod2 <- "
    intention ~ attitude + norms + control
    attitude ~~ norms + control
    norms ~~ control    
"
fitmod2 <- sem(specmod2, data=df)

# summarize the result
summary(fitmod2, fit.measures=TRUE, rsquare=TRUE)

specmod3: lm

fitmod3 <- lm(intention~attitude+norms+control, data=df)
summary(fitmod3) 

specmod4

# pbt model 
specmod4 <- "
    # Directional relations (path)
    intention ~ attitude + norms + control
    behavior ~ intention
    # Covariances 
    attitude ~~ norms + control
    norms ~~ control    
"
fitmod4 <- sem(specmod4, data=df)
summary(fitmod4, fit.measures=TRUE, rsquare=TRUE)

# my own 
# pbt model 
specmod5 <- '
    # Directional relations (path)
    intention ~ a*attitude + b*norms + c*control
    behavior ~ d*intention 
    # Covariances 
    attitude ~~ norms + control
    norms ~~ control    
    ad := a*d
    bd := b*d
    cd := c*d
'
fitmod5 <- sem(specmod5, data=df)
summary(fitmod5, fit.measures=TRUE, rsquare=TRUE)

Output

> df <- read.csv("http://commres.net/wiki/_media/r/plannedbehavior.csv")
> head(df)
  attitude norms control intention behavior
1     2.31  2.31    2.03      2.50     2.62
2     4.66  4.01    3.63      3.99     3.64
3     3.85  3.56    4.20      4.35     3.83
4     4.24  2.25    2.84      1.51     2.25
5     2.91  3.31    2.40      1.45     2.00
6     2.99  2.51    2.95      2.59     2.20
> str(df)
'data.frame':	199 obs. of  5 variables:
 $ attitude : num  2.31 4.66 3.85 4.24 2.91 2.99 3.96 3.01 4.77 3.67 ...
 $ norms    : num  2.31 4.01 3.56 2.25 3.31 2.51 4.65 2.98 3.09 3.63 ...
 $ control  : num  2.03 3.63 4.2 2.84 2.4 2.95 3.77 1.9 3.83 5 ...
 $ intention: num  2.5 3.99 4.35 1.51 1.45 2.59 4.08 2.58 4.87 3.09 ...
 $ behavior : num  2.62 3.64 3.83 2.25 2 2.2 4.41 4.15 4.35 3.95 ...
> 
> # path analysis in R using lavaan package
> # install.packages("lavaan")
> library(lavaan)
This is lavaan 0.6-9
lavaan is FREE software! Please report any bugs.
Warning message:
패키지 ‘lavaan’는 R 버전 4.1.2에서 작성되었습니다 
> 
> # Model speficiation
> specmod <- "
+     intention ~ attitude + norms + control
+ "
> # Estimate model 
> fitmod <- sem(specmod, data=df)
> 
> # summarize the result
> summary(fitmod, fit.measures=TRUE, rsquare=TRUE)
lavaan 0.6-9 ended normally after 11 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         4
                                                      
  Number of observations                           199
                                                      
Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Model Test Baseline Model:

  Test statistic                                91.633
  Degrees of freedom                                 3
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.000

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)               -219.244
  Loglikelihood unrestricted model (H1)       -219.244
                                                      
  Akaike (AIC)                                 446.489
  Bayesian (BIC)                               459.662
  Sample-size adjusted Bayesian (BIC)          446.990

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value RMSEA <= 0.05                             NA

Standardized Root Mean Square Residual:

  SRMR                                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  intention ~                                         
    attitude          0.352    0.058    6.068    0.000
    norms             0.153    0.059    2.577    0.010
    control           0.275    0.058    4.740    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .intention         0.530    0.053    9.975    0.000

R-Square:
                   Estimate
    intention         0.369

specmod2:

> # Model speficiation 2
> specmod2 <- "
+     intention ~ attitude + norms + control
+     attitude ~~ norms + control
+     norms ~~ control    
+ "
> fitmod2 <- sem(specmod2, data=df)
> 
> # summarize the result
> summary(fitmod2, fit.measures=TRUE, rsquare=TRUE)
lavaan 0.6-9 ended normally after 17 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        10
                                                      
  Number of observations                           199
                                                      
Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Model Test Baseline Model:

  Test statistic                               136.306
  Degrees of freedom                                 6
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.000

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -1011.828
  Loglikelihood unrestricted model (H1)      -1011.828
                                                      
  Akaike (AIC)                                2043.656
  Bayesian (BIC)                              2076.589
  Sample-size adjusted Bayesian (BIC)         2044.908

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value RMSEA <= 0.05                             NA

Standardized Root Mean Square Residual:

  SRMR                                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  intention ~                                         
    attitude          0.352    0.058    6.068    0.000
    norms             0.153    0.059    2.577    0.010
    control           0.275    0.058    4.740    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  attitude ~~                                         
    norms             0.200    0.064    3.128    0.002
    control           0.334    0.070    4.748    0.000
  norms ~~                                            
    control           0.220    0.065    3.411    0.001

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .intention         0.530    0.053    9.975    0.000
    attitude          0.928    0.093    9.975    0.000
    norms             0.830    0.083    9.975    0.000
    control           0.939    0.094    9.975    0.000

R-Square:
                   Estimate
    intention         0.369

specmod3: lm

> fitmod3 <- lm(intention~attitude+norms+control, data=df)
> summary(fitmod3) 

Call:
lm(formula = intention ~ attitude + norms + control, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.80282 -0.52734 -0.06018  0.51228  1.85202 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.58579    0.23963   2.445   0.0154 *  
attitude     0.35232    0.05866   6.006 9.13e-09 ***
norms        0.15250    0.05979   2.550   0.0115 *  
control      0.27502    0.05862   4.692 5.09e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7356 on 195 degrees of freedom
Multiple R-squared:  0.369,	Adjusted R-squared:  0.3593 
F-statistic: 38.01 on 3 and 195 DF,  p-value: < 2.2e-16

specmod4

> 
> # pbt model 
> specmod4 <- "
+     # Directional relations (path)
+     intention ~ attitude + norms + control
+     behavior ~ intention
+     # Covariances 
+     attitude ~~ norms + control
+     norms ~~ control    
+ "
> fitmod4 <- sem(specmod4, data=df)
> summary(fitmod4, fit.measures=TRUE, rsquare=TRUE)
lavaan 0.6-9 ended normally after 17 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        12
                                                      
  Number of observations                           199

# chi-square test 
# p-value is over .05 indicating . . . .                                                      
Model Test User Model:
                                                      
  Test statistic                                 2.023
  Degrees of freedom                                 3
  P-value (Chi-square)                           0.568

Model Test Baseline Model:

  Test statistic                               182.295
  Degrees of freedom                                10
  P-value                                        0.000

# CFI >_ .90
# TLI >_ .95
# The two indicate that the model fits to the data well

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.019

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -1258.517
  Loglikelihood unrestricted model (H1)      -1257.506
                                                      
  Akaike (AIC)                                2541.035
  Bayesian (BIC)                              2580.555
  Sample-size adjusted Bayesian (BIC)         2542.538

# RMSEA <_ .08
# 
Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.103
  P-value RMSEA <= 0.05                          0.735

# SRMR <_ .08 meets the standard
#
Standardized Root Mean Square Residual:

  SRMR                                           0.019

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  intention ~                                         
    attitude          0.352    0.058    6.068    0.000
    norms             0.153    0.059    2.577    0.010
    control           0.275    0.058    4.740    0.000
  behavior ~                                          
    intention         0.453    0.065    7.014    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  attitude ~~                                         
    norms             0.200    0.064    3.128    0.002
    control           0.334    0.070    4.748    0.000
  norms ~~                                            
    control           0.220    0.065    3.411    0.001

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .intention         0.530    0.053    9.975    0.000
   .behavior          0.699    0.070    9.975    0.000
    attitude          0.928    0.093    9.975    0.000
    norms             0.830    0.083    9.975    0.000
    control           0.939    0.094    9.975    0.000

R-Square:
                   Estimate
    intention         0.369
    behavior          0.198

specmod5

> specmod5 <- "
+     # Directional relations (path)
+     intention ~ attitude + norms + control
+     behavior ~ intention + norms
+     # Covariances 
+     attitude ~~ norms + control
+     norms ~~ control    
+ "
> fitmod5 <- sem(specmod5, data=df)
> summary(fitmod5, fit.measures=TRUE, rsquare=TRUE)
lavaan 0.6-12 ended normally after 18 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        13

  Number of observations                           199

Model Test User Model:
                                                      
  Test statistic                                 1.781
  Degrees of freedom                                 2
  P-value (Chi-square)                           0.410

Model Test Baseline Model:

  Test statistic                               182.295
  Degrees of freedom                                10
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.006

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -1258.396
  Loglikelihood unrestricted model (H1)      -1257.506
                                                      
  Akaike (AIC)                                2542.792
  Bayesian (BIC)                              2585.605
  Sample-size adjusted Bayesian (BIC)         2544.421

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.136
  P-value RMSEA <= 0.05                          0.569

Standardized Root Mean Square Residual:

  SRMR                                           0.018

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  intention ~                                         
    attitude          0.352    0.058    6.068    0.000
    norms             0.153    0.059    2.577    0.010
    control           0.275    0.058    4.740    0.000
  behavior ~                                          
    intention         0.443    0.068    6.525    0.000
    norms             0.034    0.068    0.493    0.622

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  attitude ~~                                         
    norms             0.200    0.064    3.128    0.002
    control           0.334    0.070    4.748    0.000
  norms ~~                                            
    control           0.220    0.065    3.411    0.001

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .intention         0.530    0.053    9.975    0.000
   .behavior          0.698    0.070    9.975    0.000
    attitude          0.928    0.093    9.975    0.000
    norms             0.830    0.083    9.975    0.000
    control           0.939    0.094    9.975    0.000

R-Square:
                   Estimate
    intention         0.369
    behavior          0.199

Lavaan in R: explanation

Path analysis in R with Lavaan (introduction)

By Mike Crowson, Ph.D.
September 17, 2019

install.packages("lavaan")
# processdata<-read.csv("path analysis dataN BinW.csv", header=TRUE, sep=",")
processdata<-read.csv("http://commres.net/wiki/_media/r/path_analysis_datan_binw.csv", 
                       header=TRUE, sep=",", fileEncoding="UTF-8-BOM")
str(processdata)
library(lavaan)
# model specification
model <- '
  #equation where interest is predicted by ses 
  # & mastery and performance goals
  interest ~ mastery + perfgoal + ses
  
  # equation where achieve is predicted by 
  # interest and anxiety
  achieve ~ anxiety + interest + mastery

  # equation where anxiety is predicted 
  # by mastery and performance goals
  anxiety ~ perfgoal + mastery

  # estimating the variances of 
  # the exogenous variables (ses, mastery,performance)
  mastery ~~ mastery
  perfgoal ~~ perfgoal
  ses ~~ ses

  # estimtating the covariances of the exogenous 
  # variables (ses, mastery,performance)
  mastery ~~ perfgoal + ses
  perfgoal ~~ ses

  # estimating the residual variances 
  # for endogenous variables (interest, anxiety, achieve)
  interest ~~ interest
  anxiety ~~ anxiety
  achieve ~~ achieve

  # estimating the covariance of residuals 
  # for interest and anxiety
  interest ~~ anxiety '
fit<-lavaan(model, data=processdata)
summary(fit, fit.measures=TRUE)
summary(fit, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE)
parameterEstimates(fit)
fitMeasures(fit)
modificationIndices(fit)

# model specification

model<-'
  # equation where interest is predicted by ses & mastery and 
  # performance goals
  interest ~ mastery + perfgoal + ses

  # equation where achieve is predicted by interest and anxiety
  achieve~anxiety+interest+mastery

  #equation where anxiety is predicted by mastery and performance goals
  anxiety~perfgoal+mastery

  # estimtating the variances of the exogenous variables (ses, mastery,performance)
  mastery~~mastery
  perfgoal~~perfgoal
  ses~~ses

  # estimtating the covariances of the exogenous variables (ses, mastery,performance)
  mastery~~perfgoal+ses
  perfgoal~~ses

  # The auto.var argument when fitting the model can be used so that 
  # you do not have to directly request estimation of residual variances

  # Estimating the covariance of residuals for interest and anxiety
  interest~~anxiety'

  fit<-lavaan(model, data=processdata, auto.var=TRUE)
  summary(fit, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE)
install.packages("semPlot")
library("semPlot")

semPaths(fit,what="paths",whatLabels="par",style="lisrel",layout="tree",
rotation=2)
install.packages("lavaanPlot")
library(lavaanPlot)

lavaanPlot(model = fit, 
    node_options = list(shape = "box", fontname = "Helvetica"), 
    edge_options = list(color = "grey"), 
    coefs = TRUE,
    covs = TRUE, 
    stars = c("regress"))

Resources on the use of lavaan:


Using the 'semPlot' package


Using the 'lavaanPlot' package


Raw data for all examples can be downloaded at…

A copy of the Powerpoint of the model specification can be downloaded at…

Basics of path analysis using Lavaan.txt
Displaying Basics of path analysis using Lavaan.txt.

CODING

processdata<-read.csv("http://commres.net/wiki/_media/r/path_analysis_datan_binw.csv", 
                       header=TRUE, sep=",", fileEncoding="UTF-8-BOM")
str(processdata)
library(lavaan)
model <- '
    interest ~ mastery + perfgoal + ses 
    achieve ~ anxiety + interest + mastery 
    anxiety ~ perfgoal + mastery 
    # variances
    mastery ~~ mastery
    perfgoal ~~ perfgoal
    ses ~~ ses
    
    mastery ~~ perfgoal + ses
    perfgoal ~~ ses
    
    interest ~~ interest
    anxiety ~~ anxiety
    achieve ~~ achieve
    interest~~anxiety 
'
fit <- lavaan(model, data=processdata)
fit <- sem(model, data=processdata)

summary(fit, fit.measures=TRUE)
summary(fit, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE)

parameterEstimates(fit)
fitMeasures(fit)
modificationIndices(fit)   

install.packages("semPlot")
library("semPlot")

semPaths(fit,what="paths",whatLabels="par",style="lisrel",layout="tree",
rotation=2)

install.packages("lavaanPlot")
library(lavaanPlot)

lavaanPlot(
    model = fit, 
    node_options = list(shape = "box", fontname = "Helvetica"), 
    edge_options = list(color = "grey"), 
    coefs = TRUE, covs=TRUE,
    stars = c("regress"))

 

Lavaan 2

model <- '
    # labeling path from mastery to interest
    interest ~ a*mastery + perfgoal + ses

    # labeling path from interest to achieve. 
    # Adding labeled path from
    # mastery to achieve
    achieve ~ e*anxiety + b*interest + c*mastery
    
    # predicting anxiety and labeling path from mastery
    anxiety ~ perfgoal + d*mastery
    # estimtating the variances and covariances of 
    # the exogenous variables (ses, mastery,performance)
    mastery~~mastery
    perfgoal~~perfgoal
    ses~~ses

    mastery~~perfgoal+ses
    perfgoal~~ses

    # estimating the variances of residuals 
    # for endogenous variables 
    # (interest, anxiety, achieve)
    interest~~interest
    anxiety~~anxiety
    achieve~~achieve

    # estimating the covariance of residuals 
    # for interest and anxiety
    interest~~anxiety
    # calculating specific indirect effect 
    # of mastery on achieve via interest
    SIE1:=a*b
    # calculating specific indirect effect of 
    # mastery on achieve via anxiety
    SIE2:=d*e
    # calculating total indirect effect of 
    # mastery on achievement via mediators
    TIE:=SIE1+SIE2
    # calculating total effect of mastery on achieve
    TE:=TIE+c'
    
    # using naive bootstrap to obtain standard errors
    fit <- sem(model, data=processdata, se="bootstrap")
    summary(fit,fit.measures=TRUE)
    
    # using 'parameterEstimates' function will give 
    # us confidence intervals based on naive bootstrap. 
    # A standard approach to testing indirect effects.
    parameterEstimates(fit)

Lavaan 3: Testing data normality

processdata <- read.csv("http://commres.net/wiki/_media/r/path_analysis_datan_binw.csv")
str(processdata)
# install.packages("MVN")
library(MVN)
newdata <- processdata[c("achieve", "interest", "anxiety")]
str(newdata)

Use the 'mvn' function to evalue normality

Multivariate normality is evidenced by p-values associated with multivariate skewness and kurtosis statistics that are > .05. In those cases where both the skewness and kurtosis results are non-significant (p's > .05), then the data are assumed to follow a multivariate normal distribution where p > .05 (Korkmaz, Goksuluk, & Zarasiz, 2014, 2019).

You can also use plots to explore possible multivariate outliers. Moreover, you can examine univariate tests of normality (the default is Shapiro-Wilk test, but can be changed if desired). A significant test result regarding a specific variable indicates a significant departure from normality.

mvn(newdata, mvnTest="mardia")
mvn(newdata, multivariatePlot="qq")
mvn(newdata, multivariateOutlierMethod="quan")

You can generate univariate plot as well to evaluate distribution of the endogenous variables for non-normality. Skewness values approaching 2 or kurtoisis values over 7 may be considered indicative of more “significant problems” with non-normality (Curran, et al., 1996).

mvn(newdata, univariatePlot="histogram")
mvn(newdata, univariatePlot="box")

model <- '
    interest ~ mastery + perfgoal + ses 
    achieve ~ anxiety + interest + mastery 
    anxiety ~ perfgoal + mastery 
    # variances
    mastery ~~ mastery
    perfgoal ~~ perfgoal
    ses ~~ ses
    
    mastery ~~ perfgoal + ses
    perfgoal ~~ ses
    
    interest ~~ interest
    anxiety ~~ anxiety
    achieve ~~ achieve
    interest~~anxiety 
'

We will fit the model using the 'estimator' argument at set it equal to “MLM.” This will result in the Satorra-Bentler model chi-square being computed. We will also use the 'se' argument and set it to “roburst.”

fit <- sem(model, data=processdata, estimator = "MLM", se="roburst")
summary(fit,fit.measures=TRUE)

reference

see lme4 tutorial

Exercise

Using mtcars in R

?mtcars
mtcars
str(mtcars)
df <- mtcars
# model specfication
model <-'
  mpg ~ hp + gear + cyl + disp + carb + am + wt
  hp ~ cyl + disp + carb
'
# model fit
fit <- cfa(model, data = mtcars)
summary(fit, fit.measures = TRUE, standardized=T, rsquare=T)
semPaths(fit, 'std', layout = 'circle')