Differences

This shows you the differences between two versions of the page.

--- factor_analysis [2018/12/22 01:51] – hkimscil
+++ factor_analysis [2019/09/14 18:56] – [e.g. secu com finance 2007 example] hkimscil
@@ Line 91: / Line 91: @@
 위의 요인이 포함된 regression공식이 갖는 가정은 다음과 같다.
   - $E(e_{i}) = 0, \quad Var(e_{i}) = \sigma^2_{i}$
+    * error의 분포에 관한 내용이다.
     * expected value = mean of error terms = 0, with standard deviation = $\sigma_{i}$
     * 에러는 평균 0을 중심으로 무작위로 펼쳐져 있는 상태가 가정되므로 위와 같은 성격을 갖는다.
   - $E(F_{j}) = 0, \quad Var(F_{j}) = 1 $
+    * F는 표준화된 coefficient로  크기가 나타내지는 가상의 인자이다 (factor).
     * Factors are standardized with mean =0, standard deviation = 1. Hence, Var(F) = 1.
     * factors의 계수를 내기 전의 data는 표준점수 처리가 된 것을 가정한다. 따라서, F의 mean과 standard deviation값은 각각 0과 1이어야 하고, 따라서 F의 variance값 또한 1이 된다.
@@ Line 1146: / Line 1148: @@
 ====== e.g., 5 ======
 {{:r:EFA.csv}}
+====== e.g. secu com finance 2007 example  ======
+{{:r:secu_com_finance_2007.csv}}
+<code>
+Sys.setlocale("LC_ALL","Korean")
+secu_com_finance_2007 <- read.csv("http://commres.net/wiki/_media/r/secu_com_finance_2007.csv")
+secu_com_finance_2007
+# V1 : 총자본순이익율
+# V2 : 자기자본순이익율
+# V3 : 자기자본비율
+# V4 : 부채비율
+# V5 : 자기자본회전율
+# 표준화 변환 (standardization)
+secu_com_finance_2007 <- transform(secu_com_finance_2007,
+    V1_s = scale(V1),
+    V2_s = scale(V2),
+    V3_s = scale(V3),
+    V4_s = scale(V4),
+    V5_s = scale(V5))
+# 부채비율(V4_s)을 방향(max(V4_s)-V4_s) 변환
+secu_com_finance_2007 <- transform(secu_com_finance_2007, V4_s2 = max(V4_s) - V4_s)
+# variable selection
+secu_com_finance_2007_2 <- secu_com_finance_2007[,c("company", "V1_s", "V2_s", "V3_s", "V4_s2", "V5_s")]
+# Correlation analysis
+cor(secu_com_finance_2007_2[,-1])
+round(cor(secu_com_finance_2007_2[,-1]), digits=3) # 반올림
+# Scatter plot matrix
+plot(secu_com_finance_2007_2[,-1])
+# Scree Plot
+plot(prcomp(secu_com_finance_2007_2[,c(2:6)]), type="l", sub = "Scree Plot")
+</code>
+<code>
+# 요인분석(maximum likelihood factor analysis)
+# rotation = "varimax"
+secu_factanal <- factanal(secu_com_finance_2007_2[,2:6],
+    factors = 2,
+    rotation = "varimax", # "varimax", "promax", "none"
+    scores="regression") # "regression", "Bartlett"
+print(secu_factanal)
+</code>
+<code>
+print(secu_factanal$loadings, cutoff=0) # display every loadings
+# factor scores plotting
+secu_factanal$scores
+plot(secu_factanal$scores, main="Biplot of the first 2 factors")
+# 관측치별 이름 매핑(rownames mapping)
+text(secu_factanal$scores[,1], secu_factanal$scores[,2],
+   labels = secu_com_finance_2007$company,
+   cex = 0.7, pos = 3, col = "blue")
+# factor loadings plotting
+points(secu_factanal$loadings, pch=19, col = "red")
+text(secu_factanal$loadings[,1], secu_factanal$loadings[,2],
+   labels = rownames(secu_factanal$loadings),
+   cex = 0.8, pos = 3, col = "red")
+# plotting lines between (0,0) and (factor loadings by Var.)
+segments(0,0,secu_factanal$loadings[1,1], secu_factanal$loadings[1,2])
+segments(0,0,secu_factanal$loadings[2,1], secu_factanal$loadings[2,2])
+segments(0,0,secu_factanal$loadings[3,1], secu_factanal$loadings[3,2])
+segments(0,0,secu_factanal$loadings[4,1], secu_factanal$loadings[4,2])
+segments(0,0,secu_factanal$loadings[5,1], secu_factanal$loadings[5,2])
+</code>
 ====== etc.  ======
 <del>see http://geog.uoregon.edu/bartlein/courses/geog495/lec16.html</del>
 {{:r:boxes.csv}}
 {{:r:cities.csv}}
-{{:r:secu_com_finance_2007.csv}}
 ====== Reference ======
 {{:factor_analysis_lecture_note.pdf|Lecture Note}} from databaser