User Tools

Site Tools


r:deleting_columns_in_data_frame_by_names

efa.csv

> read.csv("http://commres.net/wiki/_media/r/efa.csv", header = T)
> str(efa)
'data.frame':	90 obs. of  14 variables:
 $ Price              : int  4 3 4 4 5 4 3 4 5 4 ...
 $ Safety             : int  4 5 4 4 5 4 4 3 4 4 ...
 $ Exterior_Looks     : int  5 3 3 4 4 5 3 4 5 3 ...
 $ Space_comfort      : int  4 3 4 3 4 3 4 4 4 3 ...
 $ Technology         : int  3 4 5 3 5 4 3 5 3 5 ...
 $ After_Sales_Service: int  4 4 5 4 4 5 5 4 5 4 ...
 $ Resale_Value       : int  5 3 5 5 5 3 3 5 5 5 ...
 $ Fuel_Type          : int  4 4 4 5 3 4 4 4 4 5 ...
 $ Fuel_Efficiency    : int  4 3 5 4 4 3 5 4 4 4 ...
 $ Color              : int  2 4 4 4 5 2 4 4 4 5 ...
 $ Maintenance        : int  4 3 5 4 5 3 3 5 4 5 ...
 $ Test_drive         : int  2 2 4 2 5 2 5 2 2 2 ...
 $ Product_reviews    : int  4 2 4 5 5 2 2 4 4 2 ...
 $ Testimonials       : int  3 2 3 3 2 3 4 4 4 4 ...
> names(efa)
 [1] "Price"               "Safety"              "Exterior_Looks"      "Space_comfort"      
 [5] "Technology"          "After_Sales_Service" "Resale_Value"        "Fuel_Type"          
 [9] "Fuel_Efficiency"     "Color"               "Maintenance"         "Test_drive"         
[13] "Product_reviews"     "Testimonials"       
> 
> 
)

Suppose that you want to do a factor analysis on efa data. We want to do rough one first without rotation method, no specific factor numbers.

> efa.fa.rough <- fa(efa, rotate="none")
> efa.fa.rough
Factor Analysis using method =  minres
Call: fa(r = efa, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
                      MR1     h2   u2 com
Price                0.42 0.1755 0.82   1
Safety              -0.10 0.0105 0.99   1
Exterior_Looks      -0.07 0.0048 1.00   1
Space_comfort        0.25 0.0614 0.94   1
Technology           0.22 0.0479 0.95   1
After_Sales_Service  0.41 0.1687 0.83   1
Resale_Value         0.39 0.1497 0.85   1
Fuel_Type            0.22 0.0489 0.95   1
Fuel_Efficiency      0.72 0.5163 0.48   1
Color                0.39 0.1526 0.85   1
Maintenance          0.59 0.3527 0.65   1
Test_drive           0.29 0.0846 0.92   1
Product_reviews      0.49 0.2443 0.76   1
Testimonials         0.08 0.0060 0.99   1

                MR1
SS loadings    2.02
Proportion Var 0.14

Mean item complexity =  1
Test of the hypothesis that 1 factor is sufficient.

The degrees of freedom for the null model are  91  and the objective function was  2.97 with Chi Square of  247.71
The degrees of freedom for the model are 77  and the objective function was  1.94 

The root mean square of the residuals (RMSR) is  0.13 
The df corrected root mean square of the residuals is  0.14 

The harmonic number of observations is  90 with the empirical chi square  287.57  with prob <  4.7e-26 
The total number of observations was  90  with Likelihood Chi Square =  160.6  with prob <  7.8e-08 

Tucker Lewis Index of factoring reliability =  0.361
RMSEA index =  0.118  and the 90 % confidence intervals are  0.086 0.134
BIC =  -185.88
Fit based upon off diagonal values = 0.52
Measures of factor score adequacy             
                                                   MR1
Correlation of (regression) scores with factors   0.87
Multiple R square of scores with factors          0.76
Minimum correlation of possible factor scores     0.51
> 

Then, check the eigen-values with e.values column

> efa.fa.rough$e.values
 [1] 2.7550607 2.1640701 1.4645469 1.3299030 1.0402907 0.9919870 0.8063453 0.6810294 0.6013657
[10] 0.5536899 0.5136469 0.4665307 0.3566072 0.2749266

There are five possible factors of which e.values are over 1. We try to extract 5 factors then with varimax rotation method.

> efa.fa.5 <- fa(efa, nfactors = 5, rotate="varimax")
> efa.fa.5
Factor Analysis using method =  minres
Call: fa(r = efa, nfactors = 5, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
                      MR1   MR2   MR3   MR5   MR4   h2    u2 com
Price                0.57  0.15 -0.05 -0.04 -0.02 0.35 0.645 1.2
Safety              -0.28  0.30 -0.16  0.11  0.05 0.21 0.790 2.9
Exterior_Looks      -0.01  0.05  0.20  0.01 -0.54 0.33 0.669 1.3
Space_comfort        0.02  0.87  0.21  0.00 -0.17 0.83 0.172 1.2
Technology           0.04  0.32  0.09  0.13  0.03 0.13 0.873 1.5
After_Sales_Service  0.06  0.37  0.06  0.88  0.04 0.92 0.085 1.4
Resale_Value         0.69 -0.22 -0.17  0.16  0.01 0.59 0.413 1.5
Fuel_Type            0.06  0.54 -0.02  0.08 -0.04 0.30 0.699 1.1
Fuel_Efficiency      0.46  0.09  0.28  0.37  0.23 0.49 0.512 3.2
Color                0.21 -0.05  0.26  0.07  0.74 0.67 0.329 1.4
Maintenance          0.61  0.07  0.07  0.07  0.25 0.45 0.549 1.4
Test_drive           0.09  0.06  0.43  0.23 -0.07 0.26 0.745 1.7
Product_reviews      0.38  0.18  0.41 -0.03  0.07 0.36 0.642 2.4
Testimonials        -0.18  0.04  0.66 -0.05  0.02 0.47 0.535 1.2

                       MR1  MR2  MR3  MR5  MR4
SS loadings           1.72 1.50 1.09 1.03 1.00
Proportion Var        0.12 0.11 0.08 0.07 0.07
Cumulative Var        0.12 0.23 0.31 0.38 0.45
Proportion Explained  0.27 0.24 0.17 0.16 0.16
Cumulative Proportion 0.27 0.51 0.68 0.84 1.00

Mean item complexity =  1.7
Test of the hypothesis that 5 factors are sufficient.

The degrees of freedom for the null model are  91  and the objective function was  2.97 with Chi Square of  247.71
The degrees of freedom for the model are 31  and the objective function was  0.34 

The root mean square of the residuals (RMSR) is  0.04 
The df corrected root mean square of the residuals is  0.06 

The harmonic number of observations is  90 with the empirical chi square  20.2  with prob <  0.93 
The total number of observations was  90  with Likelihood Chi Square =  27.44  with prob <  0.65 

Tucker Lewis Index of factoring reliability =  1.071
RMSEA index =  0  and the 90 % confidence intervals are  0 0.067
BIC =  -112.06
Fit based upon off diagonal values = 0.97
Measures of factor score adequacy             
                                                   MR1  MR2  MR3  MR5  MR4
Correlation of (regression) scores with factors   0.86 0.91 0.80 0.93 0.82
Multiple R square of scores with factors          0.75 0.82 0.64 0.87 0.67
Minimum correlation of possible factor scores     0.50 0.64 0.27 0.75 0.34
> 

Sort the factor loadings to sort the variables out to proper factors.

> fa.sort(efa.fa.5)
Factor Analysis using method =  minres
Call: fa(r = efa, nfactors = 5, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
                      MR1   MR2   MR3   MR5   MR4   h2    u2 com
Resale_Value         0.69 -0.22 -0.17  0.16  0.01 0.59 0.413 1.5
Maintenance          0.61  0.07  0.07  0.07  0.25 0.45 0.549 1.4
Price                0.57  0.15 -0.05 -0.04 -0.02 0.35 0.645 1.2
Fuel_Efficiency      0.46  0.09  0.28  0.37  0.23 0.49 0.512 3.2
Space_comfort        0.02  0.87  0.21  0.00 -0.17 0.83 0.172 1.2
Fuel_Type            0.06  0.54 -0.02  0.08 -0.04 0.30 0.699 1.1
Technology           0.04  0.32  0.09  0.13  0.03 0.13 0.873 1.5
Safety              -0.28  0.30 -0.16  0.11  0.05 0.21 0.790 2.9
Testimonials        -0.18  0.04  0.66 -0.05  0.02 0.47 0.535 1.2
Test_drive           0.09  0.06  0.43  0.23 -0.07 0.26 0.745 1.7
Product_reviews      0.38  0.18  0.41 -0.03  0.07 0.36 0.642 2.4
After_Sales_Service  0.06  0.37  0.06  0.88  0.04 0.92 0.085 1.4
Color                0.21 -0.05  0.26  0.07  0.74 0.67 0.329 1.4
Exterior_Looks      -0.01  0.05  0.20  0.01 -0.54 0.33 0.669 1.3

                       MR1  MR2  MR3  MR5  MR4
SS loadings           1.72 1.50 1.09 1.03 1.00
Proportion Var        0.12 0.11 0.08 0.07 0.07
Cumulative Var        0.12 0.23 0.31 0.38 0.45
Proportion Explained  0.27 0.24 0.17 0.16 0.16
Cumulative Proportion 0.27 0.51 0.68 0.84 1.00

Mean item complexity =  1.7
Test of the hypothesis that 5 factors are sufficient.

The degrees of freedom for the null model are  91  and the objective function was  2.97 with Chi Square of  247.71
The degrees of freedom for the model are 31  and the objective function was  0.34 

The root mean square of the residuals (RMSR) is  0.04 
The df corrected root mean square of the residuals is  0.06 

The harmonic number of observations is  90 with the empirical chi square  20.2  with prob <  0.93 
The total number of observations was  90  with Likelihood Chi Square =  27.44  with prob <  0.65 

Tucker Lewis Index of factoring reliability =  1.071
RMSEA index =  0  and the 90 % confidence intervals are  0 0.067
BIC =  -112.06
Fit based upon off diagonal values = 0.97
Measures of factor score adequacy             
                                                   MR1  MR2  MR3  MR5  MR4
Correlation of (regression) scores with factors   0.86 0.91 0.80 0.93 0.82
Multiple R square of scores with factors          0.75 0.82 0.64 0.87 0.67
Minimum correlation of possible factor scores     0.50 0.64 0.27 0.75 0.34
> 

We took a look at the result, and found out that h2 value of one variable, “Technology” is under 0.2, which is assessed as small. We want to eliminate this variable and do the factor analysis again. Also, with five factors there is one variable stick out as a factor. So, we seek out 4 factors instead of 5.

                      MR1   MR2   MR3   MR5   MR4   h2    u2 com
Resale_Value         0.69 -0.22 -0.17  0.16  0.01 0.59 0.413 1.5
Maintenance          0.61  0.07  0.07  0.07  0.25 0.45 0.549 1.4
Price                0.57  0.15 -0.05 -0.04 -0.02 0.35 0.645 1.2
Fuel_Efficiency      0.46  0.09  0.28  0.37  0.23 0.49 0.512 3.2

Space_comfort        0.02  0.87  0.21  0.00 -0.17 0.83 0.172 1.2
Fuel_Type            0.06  0.54 -0.02  0.08 -0.04 0.30 0.699 1.1
Technology           0.04  0.32  0.09  0.13  0.03 0.13 0.873 1.5 **
Safety              -0.28  0.30 -0.16  0.11  0.05 0.21 0.790 2.9

Testimonials        -0.18  0.04  0.66 -0.05  0.02 0.47 0.535 1.2
Test_drive           0.09  0.06  0.43  0.23 -0.07 0.26 0.745 1.7
Product_reviews      0.38  0.18  0.41 -0.03  0.07 0.36 0.642 2.4

After_Sales_Service  0.06  0.37  0.06  0.88  0.04 0.92 0.085 1.4

Color                0.21 -0.05  0.26  0.07  0.74 0.67 0.329 1.4
Exterior_Looks      -0.01  0.05  0.20  0.01 -0.54 0.33 0.669 1.3

We can do this in several way as see below.

> efa_a <- subset(efa, select = (efa.fa.5$uniquenesses <= 0.8))
> names(efa_a)
 [1] "Price"               "Safety"              "Exterior_Looks"      "Space_comfort"      
 [5] "After_Sales_Service" "Resale_Value"        "Fuel_Type"           "Fuel_Efficiency"    
 [9] "Color"               "Maintenance"         "Test_drive"          "Product_reviews"    
[13] "Testimonials"    

> efa_ab <- subset(efa, select = -(Technology))
> names(efa_a) == names(efa_ab)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

> efa_ac <- efa[, !(names(efa) %in% drops)]
> names(efa_a) == names(efa_ac)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> 

We check where locates the “Technology” variables in efa data set. It is in the fifth.

> names(efa)
 [1] "Price"               "Safety"              "Exterior_Looks"      "Space_comfort"      
 [5] "Technology"          "After_Sales_Service" "Resale_Value"        "Fuel_Type"          
 [9] "Fuel_Efficiency"     "Color"               "Maintenance"         "Test_drive"         
[13] "Product_reviews"     "Testimonials"    

> efa_ad <- efa[,-5]
> names(efa_a) == names(efa_ad)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> 

Then, we do fa again with efa_a data.

> efa_a.fa.4 <- fa(efa_a, nfactors=4, rotate="varimax")
> efa_a.fa.4
Factor Analysis using method =  minres
Call: fa(r = efa_a, nfactors = 4, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
                      MR1   MR2   MR3   MR4   h2   u2 com
Price                0.54  0.11  0.01 -0.05 0.31 0.69 1.1
Safety              -0.27  0.40 -0.16  0.07 0.26 0.74 2.2
Exterior_Looks      -0.01  0.02  0.19 -0.56 0.35 0.65 1.2
Space_comfort       -0.01  0.74  0.26 -0.22 0.67 0.33 1.4
After_Sales_Service  0.17  0.49  0.14  0.12 0.30 0.70 1.6
Resale_Value         0.72 -0.14 -0.13  0.06 0.55 0.45 1.2
Fuel_Type            0.06  0.55  0.03 -0.07 0.31 0.69 1.1
Fuel_Efficiency      0.50  0.23  0.33  0.30 0.50 0.50 3.0
Color                0.20 -0.06  0.27  0.69 0.59 0.41 1.5
Maintenance          0.60  0.02  0.12  0.22 0.43 0.57 1.4
Test_drive           0.09  0.13  0.42 -0.02 0.20 0.80 1.3
Product_reviews      0.33  0.12  0.44  0.03 0.32 0.68 2.0
Testimonials        -0.24 -0.04  0.67  0.00 0.51 0.49 1.3

                       MR1  MR2  MR3  MR4
SS loadings           1.73 1.37 1.18 1.00
Proportion Var        0.13 0.11 0.09 0.08
Cumulative Var        0.13 0.24 0.33 0.41
Proportion Explained  0.33 0.26 0.22 0.19
Cumulative Proportion 0.33 0.59 0.81 1.00

Mean item complexity =  1.6
Test of the hypothesis that 4 factors are sufficient.

The degrees of freedom for the null model are  78  and the objective function was  2.76 with Chi Square of  231.56
The degrees of freedom for the model are 32  and the objective function was  0.48 

The root mean square of the residuals (RMSR) is  0.05 
The df corrected root mean square of the residuals is  0.07 

The harmonic number of observations is  90 with the empirical chi square  29.46  with prob <  0.6 
The total number of observations was  90  with Likelihood Chi Square =  38.68  with prob <  0.19 

Tucker Lewis Index of factoring reliability =  0.889
RMSEA index =  0.06  and the 90 % confidence intervals are  0 0.097
BIC =  -105.32
Fit based upon off diagonal values = 0.95
Measures of factor score adequacy             
                                                   MR1  MR2  MR3  MR4
Correlation of (regression) scores with factors   0.86 0.85 0.81 0.80
Multiple R square of scores with factors          0.74 0.72 0.66 0.64
Minimum correlation of possible factor scores     0.49 0.44 0.32 0.28
> 
> fa.sort(efa_a.fa.4)
Factor Analysis using method =  minres
Call: fa(r = efa_a, nfactors = 4, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
                      MR1   MR2   MR3   MR4   h2   u2 com
Resale_Value         0.72 -0.14 -0.13  0.06 0.55 0.45 1.2
Maintenance          0.60  0.02  0.12  0.22 0.43 0.57 1.4
Price                0.54  0.11  0.01 -0.05 0.31 0.69 1.1
Fuel_Efficiency      0.50  0.23  0.33  0.30 0.50 0.50 3.0
Space_comfort       -0.01  0.74  0.26 -0.22 0.67 0.33 1.4
Fuel_Type            0.06  0.55  0.03 -0.07 0.31 0.69 1.1
After_Sales_Service  0.17  0.49  0.14  0.12 0.30 0.70 1.6
Safety              -0.27  0.40 -0.16  0.07 0.26 0.74 2.2
Testimonials        -0.24 -0.04  0.67  0.00 0.51 0.49 1.3
Product_reviews      0.33  0.12  0.44  0.03 0.32 0.68 2.0
Test_drive           0.09  0.13  0.42 -0.02 0.20 0.80 1.3
Color                0.20 -0.06  0.27  0.69 0.59 0.41 1.5
Exterior_Looks      -0.01  0.02  0.19 -0.56 0.35 0.65 1.2

                       MR1  MR2  MR3  MR4
SS loadings           1.73 1.37 1.18 1.00
Proportion Var        0.13 0.11 0.09 0.08
Cumulative Var        0.13 0.24 0.33 0.41
Proportion Explained  0.33 0.26 0.22 0.19
Cumulative Proportion 0.33 0.59 0.81 1.00

Mean item complexity =  1.6
Test of the hypothesis that 4 factors are sufficient.

The degrees of freedom for the null model are  78  and the objective function was  2.76 with Chi Square of  231.56
The degrees of freedom for the model are 32  and the objective function was  0.48 

The root mean square of the residuals (RMSR) is  0.05 
The df corrected root mean square of the residuals is  0.07 

The harmonic number of observations is  90 with the empirical chi square  29.46  with prob <  0.6 
The total number of observations was  90  with Likelihood Chi Square =  38.68  with prob <  0.19 

Tucker Lewis Index of factoring reliability =  0.889
RMSEA index =  0.06  and the 90 % confidence intervals are  0 0.097
BIC =  -105.32
Fit based upon off diagonal values = 0.95
Measures of factor score adequacy             
                                                   MR1  MR2  MR3  MR4
Correlation of (regression) scores with factors   0.86 0.85 0.81 0.80
Multiple R square of scores with factors          0.74 0.72 0.66 0.64
Minimum correlation of possible factor scores     0.49 0.44 0.32 0.28
> 

We see four factors, of which names might be as follow:

  • MR1: economic factor
  • MR2: convenience factor
  • MR3: information (review) factor
  • MR4: look factor
                      MR1   MR2   MR3   MR4   h2   u2 com
Resale_Value         0.72 -0.14 -0.13  0.06 0.55 0.45 1.2
Maintenance          0.60  0.02  0.12  0.22 0.43 0.57 1.4
Price                0.54  0.11  0.01 -0.05 0.31 0.69 1.1
Fuel_Efficiency      0.50  0.23  0.33  0.30 0.50 0.50 3.0
----
Space_comfort       -0.01  0.74  0.26 -0.22 0.67 0.33 1.4
Fuel_Type            0.06  0.55  0.03 -0.07 0.31 0.69 1.1
After_Sales_Service  0.17  0.49  0.14  0.12 0.30 0.70 1.6
Safety              -0.27  0.40 -0.16  0.07 0.26 0.74 2.2
----
Testimonials        -0.24 -0.04  0.67  0.00 0.51 0.49 1.3
Product_reviews      0.33  0.12  0.44  0.03 0.32 0.68 2.0
Test_drive           0.09  0.13  0.42 -0.02 0.20 0.80 1.3
----
Color                0.20 -0.06  0.27  0.69 0.59 0.41 1.5
Exterior_Looks      -0.01  0.02  0.19 -0.56 0.35 0.65 1.2
r/deleting_columns_in_data_frame_by_names.txt · Last modified: 2017/12/21 07:30 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki