User Tools

Site Tools


multiple_regression

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
multiple_regression [2018/11/09 07:53] hkimscilmultiple_regression [2023/10/19 08:38] – [Determining IVs' role] hkimscil
Line 44: Line 44:
 ====== e.g.====== ====== e.g.======
 Data set again.  Data set again. 
 +<code>
 +datavar <- read.csv("http://commres.net/wiki/_media/regression01-bankaccount.csv") </code>
  
 ^  DATA for regression analysis   ^^^ ^  DATA for regression analysis   ^^^
Line 150: Line 152:
 |          B    Std. Error    Beta          |  |          B    Std. Error    Beta          | 
 |  1.000    |  (Constant)    6.399    |  1.517    |      4.220    |  0.004    |  1.000    |  (Constant)    6.399    |  1.517    |      4.220    |  0.004   
-|      bankIncome  income    0.012    |  0.004    |  0.616    |  3.325    |  0.013   +|      income    0.012    |  0.004    |  0.616    |  3.325    |  0.013   
 |      bankfam    -0.545    |  0.226    |  -0.446    |  -2.406    |  0.047    |      bankfam    -0.545    |  0.226    |  -0.446    |  -2.406    |  0.047   
 | a Dependent Variable: bankbook  number of bank   ||||||| | a Dependent Variable: bankbook  number of bank   |||||||
  
-b에 대한 (coefficients) 유의도 테스트는 t-test를 이용하여 한다. 위의 표에서 . . . .  
  
 +====== Slope test ======
 +
 +b에 대한 (coefficients) 유의도 테스트는 t-test를 이용하여 한다. t-test는 기본적으로 트리트먼트효과 (독립변인효과 혹은 차이)를 랜덤에러인 standard error로 나누어서 구하므로, 위의 표에서 income에 대한 t value는 0.012/0.004; bankfam의 경우는 -0.545 / 0.226로 구할 수 있다. 
 +
 +독립변인이 하나일 경우에 구한 t 값은 해당 리그레션 모델의 F test값의 제곱근을 씌운 값이 된다. 독립변인이 둘 이상인 경우에는 독립변인 간의 상관관계가 존재하는 경우가 대다수이므로 t 값의 제곱이 꼭 F 값이 되지는 않는다.
 +
 +====== Beta coefficients ======
 +[[:beta coefficients]] 혹은 Standardized coefficients 참조 
  
 ====== e.g., ====== ====== e.g., ======
 DATA: \\  DATA: \\ 
-<wrap indent>{{:elemapi2.sav}}</wrap>+<wrap indent>{{:elemapi2.sav}} 
 +{{:elemapi2.csv}} 
 +</wrap>
  
 The Academic Performance Index (**API**) is a measurement of //academic performance and progress of individual schools in California, United States//. It is one of the main components of the Public Schools Accountability Act passed by the California legislature in 1999. API scores ranges from a low of 200 to a high of 1000. [[https://www.google.co.kr/search?q=what+is+high+school+api+|Google search]] The Academic Performance Index (**API**) is a measurement of //academic performance and progress of individual schools in California, United States//. It is one of the main components of the Public Schools Accountability Act passed by the California legislature in 1999. API scores ranges from a low of 200 to a high of 1000. [[https://www.google.co.kr/search?q=what+is+high+school+api+|Google search]]
Line 235: Line 246:
 |    | number of students   | -.012   | .017   | -.019   | -.724   | .469    |    | number of students   | -.012   | .017   | -.019   | -.724   | .469   
 | a. Dependent Variable: api 2000  |||||||  | a. Dependent Variable: api 2000  ||||||| 
 +
  
 ====== e.g., ====== ====== e.g., ======
Line 321: Line 333:
 </code> </code>
  
-====== 무엇부터라는 문제 ======+===== in R ===== 
 +<code>dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv", sep = "\t", fileEncoding="UTF-8-BOM"
 +mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar) 
 +summary(mod) 
 +anova(mod) 
 + 
 +</code> 
 +<code> 
 +dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv", fileEncoding="UTF-8-BOM"
 +> mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar) 
 +> summary(mod) 
 + 
 +Call: 
 +lm(formula = api00 ~ ell + acs_k3 + avg_ed + meals, data = dvar) 
 + 
 +Residuals: 
 +     Min       1Q   Median       3Q      Max  
 +-187.020  -40.358   -0.313   36.155  173.697  
 + 
 +Coefficients: 
 +            Estimate Std. Error t value Pr(>|t|)     
 +(Intercept) 709.6388    56.2401  12.618  < 2e-16 *** 
 +ell          -0.8434     0.1958  -4.307 2.12e-05 *** 
 +acs_k3        3.3884     2.3333   1.452    0.147     
 +avg_ed       29.0724     6.9243   4.199 3.36e-05 *** 
 +meals        -2.9374     0.1948 -15.081  < 2e-16 *** 
 +--- 
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 + 
 +Residual standard error: 58.63 on 374 degrees of freedom 
 +  (21 observations deleted due to missingness) 
 +Multiple R-squared:  0.8326, Adjusted R-squared:  0.8308  
 +F-statistic:   465 on 4 and 374 DF,  p-value: < 2.2e-16 
 + 
 +> anova(mod) 
 +Analysis of Variance Table 
 + 
 +Response: api00 
 +           Df  Sum Sq Mean Sq  F value    Pr(>F)     
 +ell         1 4502711 4502711 1309.762 < 2.2e-16 *** 
 +acs_k3      1  110211  110211   32.059 2.985e-08 *** 
 +avg_ed      1  998892  998892  290.561 < 2.2e-16 *** 
 +meals        781905  781905  227.443 < 2.2e-16 *** 
 +Residuals 374 1285740    3438                        
 +--- 
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 +>  
 +</code> 
 + 
 +<code>> mod 
 + 
 +Call: 
 +lm(formula = api00 ~ ell + acs_k3 + avg_ed + meals, data = dvar) 
 + 
 +Coefficients: 
 +(Intercept)          ell       acs_k3       avg_ed        meals   
 +   709.6388      -0.8434       3.3884      29.0724      -2.9374   
 + 
 +></code> 
 +$$ \hat{Y} =  709.6388 + -0.8434 \text{ell} + 3.3884 \text{acs_k3} + 29.0724 \text{avg_ed} + -2.9374 \text{meals} \\$$  
 + 
 +그렇다면 각각의 독립변인 고유의 설명력은 얼마인가? --> see [[:partial and semipartial correlation]] 
 + 
 + 
 +====== The problem of "which one is entered first?======
  
 __그림 여기쯤 수록__ __그림 여기쯤 수록__
Line 339: Line 415:
     * . . . the stepwise procedure defines an a posteriori order based solely on a statistical consideration (the statistical significance of semi-partial correlations) . . . .     * . . . the stepwise procedure defines an a posteriori order based solely on a statistical consideration (the statistical significance of semi-partial correlations) . . . .
 ====== Determining IVs' role ====== ====== Determining IVs' role ======
 +For a complete explanation and examples, read [[:partial  and semipartial correlation]]
 https://www.youtube.com/watch?v=-QsMvrQDxyU https://www.youtube.com/watch?v=-QsMvrQDxyU
 [{{ :partial.correlations.jpg?300 |r-squared semi-partial partial correlations }}] [{{ :partial.correlations.jpg?300 |r-squared semi-partial partial correlations }}]
  
 |  | Standard Multiple   | Sequential    comments    |  | Standard Multiple   | Sequential    comments   
-| r<sub>i</sub><sup>2</sup>  \\ squared correlation \\ **zero-order** correlation   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | overlapped effects   +| r<sub>i</sub><sup>2</sup>  \\ squared correlation \\ squared **zero-order** correlation in SPSS  | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | overlapped effects   
 | ::: | IV<sub>2</sub> : (c+b) / (a+b+c+d)   | IV<sub>2</sub>: (c+b) / (a+b+c+d)   | ::: |  | ::: | IV<sub>2</sub> : (c+b) / (a+b+c+d)   | IV<sub>2</sub>: (c+b) / (a+b+c+d)   | ::: | 
-| sr<sub>i</sub><sup>2</sup>  \\ squared **semipartial** correlation \\ **part in spss**   | IV<sub>1</sub> : (a) / (a+b+c+d)   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | Usual setting \\ Unique contribution to Y   +| sr<sub>i</sub><sup>2</sup>  \\ squared \\ **semipartial** correlation \\ **part in spss**   | IV<sub>1</sub> : (a) / (a+b+c+d)   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | Usual setting \\ Unique contribution to Y   
 | ::: | IV<sub>2</sub> : %%(c%%) / (a+b+c+d)   | IV<sub>2</sub> : %%(c%%) / (a+b+c+d)   | ::: |  | ::: | IV<sub>2</sub> : %%(c%%) / (a+b+c+d)   | IV<sub>2</sub> : %%(c%%) / (a+b+c+d)   | ::: | 
-| pr<sub>i</sub><sup>2</sup>  \\ squared **partial** correlation \\ **partial in spss**   | IV<sub>1</sub> : (a) / (a+d)   | IV<sub>1</sub> : (a+b) / (a+b+d)   | Like adjusted r<sup>2</sup>  \\ Unique contribution to Y   +| pr<sub>i</sub><sup>2</sup>  \\ squared \\ **partial** correlation \\ **partial in spss**   | IV<sub>1</sub> : (a) / (a+d)   | IV<sub>1</sub> : (a+b) / (a+b+d)   | Like adjusted r<sup>2</sup>  \\ Unique contribution to Y   
 | ::: | IV<sub>2</sub> : %%(c%%) / (c+d)   | IV<sub>2</sub> : %%(c%%) / (c+d)   | ::: |  | ::: | IV<sub>2</sub> : %%(c%%) / (c+d)   | IV<sub>2</sub> : %%(c%%) / (c+d)   | ::: | 
 | IV<sub>1</sub> 이 IV<sub>2</sub> 보다 먼저 투입되었을 때를 가정   ||||  | IV<sub>1</sub> 이 IV<sub>2</sub> 보다 먼저 투입되었을 때를 가정   |||| 
Line 370: Line 447:
 Multicolliearity problem = when torelance < .01 or when VIF > 10  Multicolliearity problem = when torelance < .01 or when VIF > 10 
  
 +====== elem e.g. again ======
 +<code>
 +dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv", fileEncoding="UTF-8-BOM")
 +mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar)
 +summary(mod)
 +anova(mod)
 +</code>
 +<code>
 +dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv", fileEncoding="UTF-8-BOM")
 +> mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar)
 +> summary(mod)
  
 +Call:
 +lm(formula = api00 ~ ell + acs_k3 + avg_ed + meals, data = dvar)
  
 +Residuals:
 +     Min       1Q   Median       3Q      Max 
 +-187.020  -40.358   -0.313   36.155  173.697 
 +
 +Coefficients:
 +            Estimate Std. Error t value Pr(>|t|)    
 +(Intercept) 709.6388    56.2401  12.618  < 2e-16 ***
 +ell          -0.8434     0.1958  -4.307 2.12e-05 ***
 +acs_k3        3.3884     2.3333   1.452    0.147    
 +avg_ed       29.0724     6.9243   4.199 3.36e-05 ***
 +meals        -2.9374     0.1948 -15.081  < 2e-16 ***
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 58.63 on 374 degrees of freedom
 +  (21 observations deleted due to missingness)
 +Multiple R-squared:  0.8326, Adjusted R-squared:  0.8308 
 +F-statistic:   465 on 4 and 374 DF,  p-value: < 2.2e-16
 +
 +> anova(mod)
 +Analysis of Variance Table
 +
 +Response: api00
 +           Df  Sum Sq Mean Sq  F value    Pr(>F)    
 +ell         1 4502711 4502711 1309.762 < 2.2e-16 ***
 +acs_k3      1  110211  110211   32.059 2.985e-08 ***
 +avg_ed      1  998892  998892  290.561 < 2.2e-16 ***
 +meals        781905  781905  227.443 < 2.2e-16 ***
 +Residuals 374 1285740    3438                       
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +</code>
 +<code>
 +# install.packages("ppcor")
 +library(ppcor)
 +myvar <- data.frame(api00, ell, acs_k3, avg_ed, meals)
 +myvar <- na.omit(myvar)
 +spcor(myvar)
 +</code>
 +
 +<code>
 +> library(ppcor)
 +> myvar <- data.frame(api00, ell, acs_k3, avg_ed, meals)
 +> myvar <- na.omit(myvar)
 +> spcor(myvar)
 +$estimate
 +             api00         ell      acs_k3      avg_ed      meals
 +api00   1.00000000 -0.09112026  0.03072660  0.08883450 -0.3190889
 +ell    -0.13469956  1.00000000  0.06086724 -0.06173591  0.1626061
 +acs_k3  0.07245527  0.09709299  1.00000000 -0.13288465 -0.1367842
 +avg_ed  0.12079565 -0.05678795 -0.07662825  1.00000000 -0.2028836
 +meals  -0.29972194  0.10332189 -0.05448629 -0.14014709  1.0000000
 +
 +$p.value
 +              api00        ell    acs_k3      avg_ed        meals
 +api00  0.000000e+00 0.07761805 0.5525340 0.085390280 2.403284e-10
 +ell    8.918743e-03 0.00000000 0.2390272 0.232377348 1.558141e-03
 +acs_k3 1.608778e-01 0.05998819 0.0000000 0.009891503 7.907183e-03
 +avg_ed 1.912418e-02 0.27203887 0.1380449 0.000000000 7.424903e-05
 +meals  3.041658e-09 0.04526574 0.2919775 0.006489783 0.000000e+00
 +
 +$statistic
 +           api00       ell     acs_k3    avg_ed     meals
 +api00   0.000000 -1.769543  0.5945048  1.724797 -6.511264
 +ell    -2.628924  0.000000  1.1793030 -1.196197  3.187069
 +acs_k3  1.404911  1.886603  0.0000000 -2.592862 -2.670380
 +avg_ed  2.353309 -1.100002 -1.4862899  0.000000 -4.006914
 +meals  -6.075665  2.008902 -1.0552823 -2.737331  0.000000
 +
 +$n
 +[1] 379
 +
 +$gp
 +[1] 3
 +
 +$method
 +[1] "pearson"
 +
 +
 +</code>
 +
 +<code>
 +> spcor.test(myvar$api00, myvar$meals, myvar[,c(2,3,4)])
 +    estimate      p.value statistic   n gp  Method
 +1 -0.3190889 2.403284e-10 -6.511264 379  3 pearson
 +
 +</code>
 ====== e.g., ====== ====== e.g., ======
 [[:multiple regression examples]] [[:multiple regression examples]]
Line 396: Line 574:
   * LifeSat Score on Life Satisfaction Inventory seven years after College   * LifeSat Score on Life Satisfaction Inventory seven years after College
   * Income Income seven years after College (in thousands)   * Income Income seven years after College (in thousands)
 +
 +====== exercise ======
 +{{:insurance.csv}}
 +<code>
 +dvar <- read.csv("http://commres.net/wiki/_media/insurance.csv")
 +</code>
 +
 +[[:Multiple Regression Exercise]]
  
 ====== Resources ====== ====== Resources ======
Line 417: Line 603:
   * https://www.r-bloggers.com/analysis-of-covariance-%E2%80%93-extending-simple-linear-regression/   * https://www.r-bloggers.com/analysis-of-covariance-%E2%80%93-extending-simple-linear-regression/
   * http://www.wekaleamstudios.co.uk/posts/analysis-of-covariance-extending-simple-linear-regression/   * http://www.wekaleamstudios.co.uk/posts/analysis-of-covariance-extending-simple-linear-regression/
 +
 +https://www.youtube.com/user/marinstatlectures/search?query=Multiple+Linear+Regression+
 +
 +
 {{tag> "research methods" "statistics" "regression" "multiple regression"}} {{tag> "research methods" "statistics" "regression" "multiple regression"}}
 +
multiple_regression.txt · Last modified: 2023/10/19 08:39 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki