Regressions vs. Path Analysis (or SEM)
Model Identification
out of possible 15 relationships
15 - 12 =3 (df)
$\chi^2$ | $\text{CFI}$ | $\text{TLI}$ | $\text{RMSEA}$ | $\text{SRMR}$ |
$p \ge .05$ | $p \ge .90$ | $p \ge .95$ | $p \le .08$ | $p \le .08$ |
Then what is SEM (Structural Equation Modeling)
###################################################### ## data file: PlannedBehavior.csv ###################################################### ###################################################### install.packages("readr") library(readr) df <- read.csv("http://commres.net/wiki/_media/r/plannedbehavior.csv") head(df) str(df) # path analysis in R using lavaan package # install.packages("lavaan") library(lavaan) # Model speficiation specmod <- " intention ~ attitude + norms + control " # Estimate model fitmod <- sem(specmod, data=df) # summarize the result summary(fitmod, fit.measures=TRUE, rsquare=TRUE)
# Model speficiation 2 specmod2 <- " intention ~ attitude + norms + control attitude ~~ norms + control norms ~~ control " fitmod2 <- sem(specmod2, data=df) # summarize the result summary(fitmod2, fit.measures=TRUE, rsquare=TRUE)
fitmod3 <- lm(intention~attitude+norms+control, data=df) summary(fitmod3)
# pbt model specmod4 <- " # Directional relations (path) intention ~ attitude + norms + control behavior ~ intention # Covariances attitude ~~ norms + control norms ~~ control " fitmod4 <- sem(specmod4, data=df) summary(fitmod4, fit.measures=TRUE, rsquare=TRUE)
# my own # pbt model specmod5 <- ' # Directional relations (path) intention ~ a*attitude + b*norms + c*control behavior ~ d*intention # Covariances attitude ~~ norms + control norms ~~ control ad := a*d bd := b*d cd := c*d ' fitmod5 <- sem(specmod5, data=df) summary(fitmod5, fit.measures=TRUE, rsquare=TRUE)
> df <- read.csv("http://commres.net/wiki/_media/r/plannedbehavior.csv") > head(df) attitude norms control intention behavior 1 2.31 2.31 2.03 2.50 2.62 2 4.66 4.01 3.63 3.99 3.64 3 3.85 3.56 4.20 4.35 3.83 4 4.24 2.25 2.84 1.51 2.25 5 2.91 3.31 2.40 1.45 2.00 6 2.99 2.51 2.95 2.59 2.20 > str(df) 'data.frame': 199 obs. of 5 variables: $ attitude : num 2.31 4.66 3.85 4.24 2.91 2.99 3.96 3.01 4.77 3.67 ... $ norms : num 2.31 4.01 3.56 2.25 3.31 2.51 4.65 2.98 3.09 3.63 ... $ control : num 2.03 3.63 4.2 2.84 2.4 2.95 3.77 1.9 3.83 5 ... $ intention: num 2.5 3.99 4.35 1.51 1.45 2.59 4.08 2.58 4.87 3.09 ... $ behavior : num 2.62 3.64 3.83 2.25 2 2.2 4.41 4.15 4.35 3.95 ... > > # path analysis in R using lavaan package > # install.packages("lavaan") > library(lavaan) This is lavaan 0.6-9 lavaan is FREE software! Please report any bugs. Warning message: 패키지 ‘lavaan’는 R 버전 4.1.2에서 작성되었습니다 > > # Model speficiation > specmod <- " + intention ~ attitude + norms + control + " > # Estimate model > fitmod <- sem(specmod, data=df) > > # summarize the result > summary(fitmod, fit.measures=TRUE, rsquare=TRUE) lavaan 0.6-9 ended normally after 11 iterations Estimator ML Optimization method NLMINB Number of model parameters 4 Number of observations 199 Model Test User Model: Test statistic 0.000 Degrees of freedom 0 Model Test Baseline Model: Test statistic 91.633 Degrees of freedom 3 P-value 0.000 User Model versus Baseline Model: Comparative Fit Index (CFI) 1.000 Tucker-Lewis Index (TLI) 1.000 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -219.244 Loglikelihood unrestricted model (H1) -219.244 Akaike (AIC) 446.489 Bayesian (BIC) 459.662 Sample-size adjusted Bayesian (BIC) 446.990 Root Mean Square Error of Approximation: RMSEA 0.000 90 Percent confidence interval - lower 0.000 90 Percent confidence interval - upper 0.000 P-value RMSEA <= 0.05 NA Standardized Root Mean Square Residual: SRMR 0.000 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Regressions: Estimate Std.Err z-value P(>|z|) intention ~ attitude 0.352 0.058 6.068 0.000 norms 0.153 0.059 2.577 0.010 control 0.275 0.058 4.740 0.000 Variances: Estimate Std.Err z-value P(>|z|) .intention 0.530 0.053 9.975 0.000 R-Square: Estimate intention 0.369
> # Model speficiation 2 > specmod2 <- " + intention ~ attitude + norms + control + attitude ~~ norms + control + norms ~~ control + " > fitmod2 <- sem(specmod2, data=df) > > # summarize the result > summary(fitmod2, fit.measures=TRUE, rsquare=TRUE) lavaan 0.6-9 ended normally after 17 iterations Estimator ML Optimization method NLMINB Number of model parameters 10 Number of observations 199 Model Test User Model: Test statistic 0.000 Degrees of freedom 0 Model Test Baseline Model: Test statistic 136.306 Degrees of freedom 6 P-value 0.000 User Model versus Baseline Model: Comparative Fit Index (CFI) 1.000 Tucker-Lewis Index (TLI) 1.000 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -1011.828 Loglikelihood unrestricted model (H1) -1011.828 Akaike (AIC) 2043.656 Bayesian (BIC) 2076.589 Sample-size adjusted Bayesian (BIC) 2044.908 Root Mean Square Error of Approximation: RMSEA 0.000 90 Percent confidence interval - lower 0.000 90 Percent confidence interval - upper 0.000 P-value RMSEA <= 0.05 NA Standardized Root Mean Square Residual: SRMR 0.000 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Regressions: Estimate Std.Err z-value P(>|z|) intention ~ attitude 0.352 0.058 6.068 0.000 norms 0.153 0.059 2.577 0.010 control 0.275 0.058 4.740 0.000 Covariances: Estimate Std.Err z-value P(>|z|) attitude ~~ norms 0.200 0.064 3.128 0.002 control 0.334 0.070 4.748 0.000 norms ~~ control 0.220 0.065 3.411 0.001 Variances: Estimate Std.Err z-value P(>|z|) .intention 0.530 0.053 9.975 0.000 attitude 0.928 0.093 9.975 0.000 norms 0.830 0.083 9.975 0.000 control 0.939 0.094 9.975 0.000 R-Square: Estimate intention 0.369
> fitmod3 <- lm(intention~attitude+norms+control, data=df) > summary(fitmod3) Call: lm(formula = intention ~ attitude + norms + control, data = df) Residuals: Min 1Q Median 3Q Max -1.80282 -0.52734 -0.06018 0.51228 1.85202 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.58579 0.23963 2.445 0.0154 * attitude 0.35232 0.05866 6.006 9.13e-09 *** norms 0.15250 0.05979 2.550 0.0115 * control 0.27502 0.05862 4.692 5.09e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.7356 on 195 degrees of freedom Multiple R-squared: 0.369, Adjusted R-squared: 0.3593 F-statistic: 38.01 on 3 and 195 DF, p-value: < 2.2e-16
> > # pbt model > specmod4 <- " + # Directional relations (path) + intention ~ attitude + norms + control + behavior ~ intention + # Covariances + attitude ~~ norms + control + norms ~~ control + " > fitmod4 <- sem(specmod4, data=df) > summary(fitmod4, fit.measures=TRUE, rsquare=TRUE) lavaan 0.6-9 ended normally after 17 iterations Estimator ML Optimization method NLMINB Number of model parameters 12 Number of observations 199 # chi-square test # p-value is over .05 indicating . . . . Model Test User Model: Test statistic 2.023 Degrees of freedom 3 P-value (Chi-square) 0.568 Model Test Baseline Model: Test statistic 182.295 Degrees of freedom 10 P-value 0.000 # CFI >_ .90 # TLI >_ .95 # The two indicate that the model fits to the data well User Model versus Baseline Model: Comparative Fit Index (CFI) 1.000 Tucker-Lewis Index (TLI) 1.019 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -1258.517 Loglikelihood unrestricted model (H1) -1257.506 Akaike (AIC) 2541.035 Bayesian (BIC) 2580.555 Sample-size adjusted Bayesian (BIC) 2542.538 # RMSEA <_ .08 # Root Mean Square Error of Approximation: RMSEA 0.000 90 Percent confidence interval - lower 0.000 90 Percent confidence interval - upper 0.103 P-value RMSEA <= 0.05 0.735 # SRMR <_ .08 meets the standard # Standardized Root Mean Square Residual: SRMR 0.019 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Regressions: Estimate Std.Err z-value P(>|z|) intention ~ attitude 0.352 0.058 6.068 0.000 norms 0.153 0.059 2.577 0.010 control 0.275 0.058 4.740 0.000 behavior ~ intention 0.453 0.065 7.014 0.000 Covariances: Estimate Std.Err z-value P(>|z|) attitude ~~ norms 0.200 0.064 3.128 0.002 control 0.334 0.070 4.748 0.000 norms ~~ control 0.220 0.065 3.411 0.001 Variances: Estimate Std.Err z-value P(>|z|) .intention 0.530 0.053 9.975 0.000 .behavior 0.699 0.070 9.975 0.000 attitude 0.928 0.093 9.975 0.000 norms 0.830 0.083 9.975 0.000 control 0.939 0.094 9.975 0.000 R-Square: Estimate intention 0.369 behavior 0.198
> specmod5 <- " + # Directional relations (path) + intention ~ attitude + norms + control + behavior ~ intention + norms + # Covariances + attitude ~~ norms + control + norms ~~ control + " > fitmod5 <- sem(specmod5, data=df) > summary(fitmod5, fit.measures=TRUE, rsquare=TRUE) lavaan 0.6-12 ended normally after 18 iterations Estimator ML Optimization method NLMINB Number of model parameters 13 Number of observations 199 Model Test User Model: Test statistic 1.781 Degrees of freedom 2 P-value (Chi-square) 0.410 Model Test Baseline Model: Test statistic 182.295 Degrees of freedom 10 P-value 0.000 User Model versus Baseline Model: Comparative Fit Index (CFI) 1.000 Tucker-Lewis Index (TLI) 1.006 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -1258.396 Loglikelihood unrestricted model (H1) -1257.506 Akaike (AIC) 2542.792 Bayesian (BIC) 2585.605 Sample-size adjusted Bayesian (BIC) 2544.421 Root Mean Square Error of Approximation: RMSEA 0.000 90 Percent confidence interval - lower 0.000 90 Percent confidence interval - upper 0.136 P-value RMSEA <= 0.05 0.569 Standardized Root Mean Square Residual: SRMR 0.018 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Regressions: Estimate Std.Err z-value P(>|z|) intention ~ attitude 0.352 0.058 6.068 0.000 norms 0.153 0.059 2.577 0.010 control 0.275 0.058 4.740 0.000 behavior ~ intention 0.443 0.068 6.525 0.000 norms 0.034 0.068 0.493 0.622 Covariances: Estimate Std.Err z-value P(>|z|) attitude ~~ norms 0.200 0.064 3.128 0.002 control 0.334 0.070 4.748 0.000 norms ~~ control 0.220 0.065 3.411 0.001 Variances: Estimate Std.Err z-value P(>|z|) .intention 0.530 0.053 9.975 0.000 .behavior 0.698 0.070 9.975 0.000 attitude 0.928 0.093 9.975 0.000 norms 0.830 0.083 9.975 0.000 control 0.939 0.094 9.975 0.000 R-Square: Estimate intention 0.369 behavior 0.199
Path analysis in R with Lavaan (introduction)
By Mike Crowson, Ph.D.
September 17, 2019
install.packages("lavaan")
# processdata<-read.csv("path analysis dataN BinW.csv", header=TRUE, sep=",") processdata<-read.csv("http://commres.net/wiki/_media/r/path_analysis_datan_binw.csv", header=TRUE, sep=",", fileEncoding="UTF-8-BOM")
str(processdata)
library(lavaan)
# model specification model <- ' #equation where interest is predicted by ses # & mastery and performance goals interest ~ mastery + perfgoal + ses # equation where achieve is predicted by # interest and anxiety achieve ~ anxiety + interest + mastery # equation where anxiety is predicted # by mastery and performance goals anxiety ~ perfgoal + mastery # estimating the variances of # the exogenous variables (ses, mastery,performance) mastery ~~ mastery perfgoal ~~ perfgoal ses ~~ ses # estimtating the covariances of the exogenous # variables (ses, mastery,performance) mastery ~~ perfgoal + ses perfgoal ~~ ses # estimating the residual variances # for endogenous variables (interest, anxiety, achieve) interest ~~ interest anxiety ~~ anxiety achieve ~~ achieve # estimating the covariance of residuals # for interest and anxiety interest ~~ anxiety '
fit<-lavaan(model, data=processdata)
summary(fit, fit.measures=TRUE)
summary(fit, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE)
parameterEstimates(fit)
fitMeasures(fit)
modificationIndices(fit)
# model specification model<-' # equation where interest is predicted by ses & mastery and # performance goals interest ~ mastery + perfgoal + ses # equation where achieve is predicted by interest and anxiety achieve~anxiety+interest+mastery #equation where anxiety is predicted by mastery and performance goals anxiety~perfgoal+mastery # estimtating the variances of the exogenous variables (ses, mastery,performance) mastery~~mastery perfgoal~~perfgoal ses~~ses # estimtating the covariances of the exogenous variables (ses, mastery,performance) mastery~~perfgoal+ses perfgoal~~ses # The auto.var argument when fitting the model can be used so that # you do not have to directly request estimation of residual variances # Estimating the covariance of residuals for interest and anxiety interest~~anxiety' fit<-lavaan(model, data=processdata, auto.var=TRUE) summary(fit, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE)
install.packages("semPlot") library("semPlot") semPaths(fit,what="paths",whatLabels="par",style="lisrel",layout="tree", rotation=2)
install.packages("lavaanPlot") library(lavaanPlot) lavaanPlot(model = fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE, covs = TRUE, stars = c("regress"))
Resources on the use of lavaan:
Using the 'semPlot' package
Using the 'lavaanPlot' package
Raw data for all examples can be downloaded at…
A copy of the Powerpoint of the model specification can be downloaded at…
Basics of path analysis using Lavaan.txt
Displaying Basics of path analysis using Lavaan.txt.
CODING
processdata<-read.csv("http://commres.net/wiki/_media/r/path_analysis_datan_binw.csv", header=TRUE, sep=",", fileEncoding="UTF-8-BOM") str(processdata) library(lavaan) model <- ' interest ~ mastery + perfgoal + ses achieve ~ anxiety + interest + mastery anxiety ~ perfgoal + mastery # variances mastery ~~ mastery perfgoal ~~ perfgoal ses ~~ ses mastery ~~ perfgoal + ses perfgoal ~~ ses interest ~~ interest anxiety ~~ anxiety achieve ~~ achieve interest~~anxiety ' fit <- lavaan(model, data=processdata) fit <- sem(model, data=processdata) summary(fit, fit.measures=TRUE) summary(fit, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE) parameterEstimates(fit) fitMeasures(fit) modificationIndices(fit) install.packages("semPlot") library("semPlot") semPaths(fit,what="paths",whatLabels="par",style="lisrel",layout="tree", rotation=2) install.packages("lavaanPlot") library(lavaanPlot) lavaanPlot( model = fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE, covs=TRUE, stars = c("regress"))
model <- ' # labeling path from mastery to interest interest ~ a*mastery + perfgoal + ses # labeling path from interest to achieve. # Adding labeled path from # mastery to achieve achieve ~ e*anxiety + b*interest + c*mastery # predicting anxiety and labeling path from mastery anxiety ~ perfgoal + d*mastery # estimtating the variances and covariances of # the exogenous variables (ses, mastery,performance) mastery~~mastery perfgoal~~perfgoal ses~~ses mastery~~perfgoal+ses perfgoal~~ses # estimating the variances of residuals # for endogenous variables # (interest, anxiety, achieve) interest~~interest anxiety~~anxiety achieve~~achieve # estimating the covariance of residuals # for interest and anxiety interest~~anxiety # calculating specific indirect effect # of mastery on achieve via interest SIE1:=a*b # calculating specific indirect effect of # mastery on achieve via anxiety SIE2:=d*e # calculating total indirect effect of # mastery on achievement via mediators TIE:=SIE1+SIE2 # calculating total effect of mastery on achieve TE:=TIE+c' # using naive bootstrap to obtain standard errors fit <- sem(model, data=processdata, se="bootstrap") summary(fit,fit.measures=TRUE) # using 'parameterEstimates' function will give # us confidence intervals based on naive bootstrap. # A standard approach to testing indirect effects. parameterEstimates(fit)
processdata <- read.csv("http://commres.net/wiki/_media/r/path_analysis_datan_binw.csv") str(processdata) # install.packages("MVN") library(MVN) newdata <- processdata[c("achieve", "interest", "anxiety")] str(newdata)
Use the 'mvn' function to evalue normality
Multivariate normality is evidenced by p-values associated with multivariate skewness and kurtosis statistics that are > .05. In those cases where both the skewness and kurtosis results are non-significant (p's > .05), then the data are assumed to follow a multivariate normal distribution where p > .05 (Korkmaz, Goksuluk, & Zarasiz, 2014, 2019).
You can also use plots to explore possible multivariate outliers. Moreover, you can examine univariate tests of normality (the default is Shapiro-Wilk test, but can be changed if desired). A significant test result regarding a specific variable indicates a significant departure from normality.
mvn(newdata, mvnTest="mardia") mvn(newdata, multivariatePlot="qq") mvn(newdata, multivariateOutlierMethod="quan")
You can generate univariate plot as well to evaluate distribution of the endogenous variables for non-normality. Skewness values approaching 2 or kurtoisis values over 7 may be considered indicative of more “significant problems” with non-normality (Curran, et al., 1996).
mvn(newdata, univariatePlot="histogram") mvn(newdata, univariatePlot="box") model <- ' interest ~ mastery + perfgoal + ses achieve ~ anxiety + interest + mastery anxiety ~ perfgoal + mastery # variances mastery ~~ mastery perfgoal ~~ perfgoal ses ~~ ses mastery ~~ perfgoal + ses perfgoal ~~ ses interest ~~ interest anxiety ~~ anxiety achieve ~~ achieve interest~~anxiety '
We will fit the model using the 'estimator' argument at set it equal to “MLM.” This will result in the Satorra-Bentler model chi-square being computed. We will also use the 'se' argument and set it to “roburst.”
fit <- sem(model, data=processdata, estimator = "MLM", se="roburst") summary(fit,fit.measures=TRUE)
reference
see lme4 tutorial
Using mtcars in R
?mtcars mtcars str(mtcars) df <- mtcars
# model specfication model <-' mpg ~ hp + gear + cyl + disp + carb + am + wt hp ~ cyl + disp + carb ' # model fit fit <- cfa(model, data = mtcars) summary(fit, fit.measures = TRUE, standardized=T, rsquare=T) semPaths(fit, 'std', layout = 'circle')