Differences

This shows you the differences between two versions of the page.

--- factor_analysis [2022/06/09 00:45] – old revision restored (2022/05/07 16:14) hkimscil
+++ factor_analysis [2025/11/13 01:23] (current) – [Factor solution among many . . .] hkimscil
@@ Line 126: / Line 126: @@
     * fiance (혹은 다른 시험) 점수의 총 분산값은 $F1$과 $F2$의 coefficient(loading)값을 각각 제곱해서 더한 것에
     * 에러의 분산값을 더한 것과 같다.
-  * 여기서 loading 제곱의 합은 regression으로 설명되는 부분이고
+  * 여기서 loading 제곱의 합은 regression으로 설명되는 부분이고 (regression analysis에서 regression part (ss.reg))
-  * 에러의 분산값은 어느 factor에도 기여를 하지 못하는 나머지 부분이다.
+  * 에러의 분산값은 어느 factor에도 기여를 하지 못하는 나머지 부분이다. (residual part (ss.res))
   * 즉, fiance의 분산값은 $F1$, $F2$가 기여하는 부분과 이 둘에 포함되지 않는 나머지로 나눌 수 있다. 이는 regression에서 explained(regression) variance와 unexplained variance를 이야기 하는 것과 같은 이치이다.
   * 앞의 두 coefficient(계수 혹은 factor loading)을 **communality**라고 부른다. 이 이름이 자연스러운 것은 Y의 총분산 중 두 요인($F1$, $F2$)이 __공통적으로__ 기여하는 부분의 분산이기 때문이다.
@@ Line 185: / Line 185: @@
 | Y3  | $S_{31}$  | $S_{32}$  | $S^2_{3}$  |
-실제 데이터에서 구한 variance covariance table은 아래와 같다.
+실제 데이터에서 구한 variance covariance table은 아래와 같다((편의상 여기 분산값은 n으로 (n-1이 아닌) 나눠 준 것)).
 | Variable  | Y1  | Y2  | Y3  |
@@ Line 197: / Line 197: @@
 ## 예를 들어
 fd <- read.csv("http://commres.net/wiki/_media/r/fa_explanation.csv")
+fd <- fd[, -1] # 처음 id 컬럼 지우기
 cov(fd)
@@ Line 273: / Line 274: @@
 ====== Factor solution among many . . . ======
-| Variable, \\ Y<sub>i</sub>  |  Observed \\ variance, S<sup>2</sup><sub>i</sub>  |  Communality, \\ $\beta^2_{i1} +\beta^2_{i2} $  |
+| Variable, \\ Y<sub>i</sub>  |  Observed \\ variance, S<sup>2</sup><sub>i</sub>  |  Communality, \\ $\beta^2_{i1} +\beta^2_{i2} $  |  Specificity, \\   |
-| Finance, Y<sub>1</sub>  |  S<sup>2</sup><sub>1</sub>   |  $\beta^2_{11} +\beta^2_{12} $   |
+| Finance, Y<sub>1</sub>  |  S<sup>2</sup><sub>1</sub>   |  $\beta^2_{11} +\beta^2_{12} $   |  $ \sigma_{i}^{2} $  |
-| Marketing, Y<sub>2</sub>  |  S<sup>2</sup><sub>2</sub>  |  $\beta^2_{21} +\beta^2_{22} $  |
+| Marketing, Y<sub>2</sub>  |  S<sup>2</sup><sub>2</sub>  |  $\beta^2_{21} +\beta^2_{22} $  |  |
-| Policy, Y<sub>3</sub>   |  S<sup>2</sup><sub>3</sub>  |  $\beta^2_{31} +\beta^2_{32} $  |
+| Policy, Y<sub>3</sub>   |  S<sup>2</sup><sub>3</sub>  |  $\beta^2_{31} +\beta^2_{32} $  |  |
-| total  |  T<sub>observed</sub>  |  T<sub>total</sub>  |
+| total  |  T<sub>observed</sub>  |  T<sub>total</sub>  |  |
 각 변인의 Observed Variance는 df (즉, n-1)을 사용하는 대신 n을 사용하여 구함.
@@ Line 332: / Line 333: @@
 각주 1) -> finance = 수학능력 = F1
 각주 2), 3) -> marketing, policy = 언어능력 = F2
-각주 4)는  아래와 같이 구함 = Eigenvalue라 부른다
+각주 6)는  아래와 같이 구함 = Eigenvalue라 부른다
 <code>
@@ Line 457: / Line 458: @@
 | Economics       |  @lightgreen:0.728  |
 | Total           |  5.617  |
+===== Specificity =====
+| Variable  |  Communality  |  Specificity  |
+| Climate         |  0.795  |  @lightgray:1-0.795  |
+| Housing         |  0.518  |     |
+| Health          |  0.722  |     |
+| Crime           |  0.512  |     |
+| Transportation  |  0.51   |     |
+| Education       |  0.561  |     |
+| Arts            |  0.754  |     |
+| Recreation      |  0.517  |     |
+| Economics       |  0.728  |     |
+| Total           |  5.617  |     |
 ====== Methods (functions) in R ======
@@ Line 468: / Line 481: @@
 <code>
-mydata <- read.csv("http://commres.net/wiki/_media/r/dataset_exploratoryfactoranalysis.csv")
+my.data <- read.csv("http://commres.net/wiki/_media/r/dataset_exploratoryfactoranalysis.csv")
 # if data as NAs, it is better to omit them:
 my.data <- na.omit(my.data)
@@ Line 1553: / Line 1566: @@
 [[https://stats.oarc.ucla.edu/spss/seminars/introduction-to-factor-analysis/a-practical-introduction-to-factor-analysis/]]
 [[https://advstats.psychstat.org/book/factor/efa.php]]
+see exploratory factor analysis :: {{youtube>Ollp2nSQCLY}}