User Tools

Site Tools


multiple_regression

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
multiple_regression [2019/05/23 10:20] – [Why overall model is significant while IVs are not?] hkimscilmultiple_regression [2023/10/19 08:39] (current) – [Determining IVs' role] hkimscil
Line 44: Line 44:
 ====== e.g.====== ====== e.g.======
 Data set again.  Data set again. 
 +<code>
 +datavar <- read.csv("http://commres.net/wiki/_media/regression01-bankaccount.csv") </code>
  
 ^  DATA for regression analysis   ^^^ ^  DATA for regression analysis   ^^^
Line 166: Line 168:
 ====== e.g., ====== ====== e.g., ======
 DATA: \\  DATA: \\ 
-<wrap indent>{{:elemapi2.sav}}</wrap>+<wrap indent>{{:elemapi2.sav}} 
 +{{:elemapi2.csv}} 
 +</wrap>
  
 The Academic Performance Index (**API**) is a measurement of //academic performance and progress of individual schools in California, United States//. It is one of the main components of the Public Schools Accountability Act passed by the California legislature in 1999. API scores ranges from a low of 200 to a high of 1000. [[https://www.google.co.kr/search?q=what+is+high+school+api+|Google search]] The Academic Performance Index (**API**) is a measurement of //academic performance and progress of individual schools in California, United States//. It is one of the main components of the Public Schools Accountability Act passed by the California legislature in 1999. API scores ranges from a low of 200 to a high of 1000. [[https://www.google.co.kr/search?q=what+is+high+school+api+|Google search]]
Line 242: Line 246:
 |    | number of students   | -.012   | .017   | -.019   | -.724   | .469    |    | number of students   | -.012   | .017   | -.019   | -.724   | .469   
 | a. Dependent Variable: api 2000  |||||||  | a. Dependent Variable: api 2000  ||||||| 
 +
  
 ====== e.g., ====== ====== e.g., ======
Line 328: Line 333:
 </code> </code>
  
-====== Why overall model is significant while IVs are not? ====== +===== in R ===== 
-see https://www.researchgate.net/post/Why_is_the_Multiple_regression_model_not_significant_while_simple_regression_for_the_same_variables_is_significant+<code>dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv", sep = "\t", fileEncoding="UTF-8-BOM"
 +mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar) 
 +summary(mod) 
 +anova(mod)
  
 +</code>
 <code> <code>
-RSS = 3:10 #Right shoe size +dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv"fileEncoding="UTF-8-BOM"
-LSS = rnorm(RSS, RSS, 0.1) #Left shoe size - similar to RSS +mod <- lm(api00 ell + acs_k3 + avg_ed mealsdata=dvar
-cor(LSS, RSS) #correlation ~ 0.99 +> summary(mod)
-  +
-weights = 120 + rnorm(RSS10*RSS, 10) +
-  +
-##Fit a joint model +
-lm(weights ~ LSS + RSS) +
- +
-##F-value is very small, but neither LSS or RSS are significant +
-summary(m+
-</code> +
- +
- +
-<code>> RSS = 3:10 #Right shoe size +
-> LSS = rnorm(RSS, RSS, 0.1) #Left shoe size similar to RSS +
-> cor(LSS, RSS) #correlation 0.99 +
-[1] 0.9994836 +
->  +
-> weights = 120 rnorm(RSS10*RSS, 10) +
->  +
-> ##Fit a joint model +
-> m lm(weights ~ LSS + RSS) +
->  +
-> ##F-value is very small, but neither LSS or RSS are significant +
-> summary(m)+
  
 Call: Call:
-lm(formula = weights LSS RSS)+lm(formula = api00 ell acs_k3 + avg_ed + meals, data = dvar)
  
 Residuals: Residuals:
-            2                                      +     Min       1Q   Median       3Q      Max  
- 4.8544  4.5254 -3.6333 -7.6402 -0.2467 -3.1997 -5.2665 10.6066 +-187.020  -40.358   -0.313   36.155  173.697 
  
 Coefficients: Coefficients:
             Estimate Std. Error t value Pr(>|t|)                 Estimate Std. Error t value Pr(>|t|)    
-(Intercept)  104.842      8.169  12.834 5.11e-05 *** +(Intercept) 709.6388    56.2401  12.618  < 2e-16 *** 
-LSS          -14.162     35.447  -0.400    0.706     +ell          -0.8434     0.1958  -4.307 2.12e-05 *** 
-RSS           26.305     35.034   0.751    0.487    +acs_k3        3.3884     2.3333   1.452    0.147     
 +avg_ed       29.0724     6.9243   4.199 3.36e-05 *** 
 +meals        -2.9374     0.1948 -15.081  < 2e-16 ***
 --- ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  
-Residual standard error: 7.296 on degrees of freedom +Residual standard error: 58.63 on 374 degrees of freedom 
-Multiple R-squared:  0.9599, Adjusted R-squared:  0.9439  +  (21 observations deleted due to missingness) 
-F-statistic: 59.92 on and DF,  p-value: 0.000321+Multiple R-squared:  0.8326, Adjusted R-squared:  0.8308  
 +F-statistic:   465 on and 374 DF,  p-value: < 2.2e-16
  
 +> anova(mod)
 +Analysis of Variance Table
 +
 +Response: api00
 +           Df  Sum Sq Mean Sq  F value    Pr(>F)    
 +ell         1 4502711 4502711 1309.762 < 2.2e-16 ***
 +acs_k3      1  110211  110211   32.059 2.985e-08 ***
 +avg_ed      1  998892  998892  290.561 < 2.2e-16 ***
 +meals        781905  781905  227.443 < 2.2e-16 ***
 +Residuals 374 1285740    3438                       
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  
-##Fitting RSS or LSS separately gives a significant result.  +</code
-summary(lm(weights ~ LSS))+ 
 +<code>mod
  
 Call: Call:
-lm(formula = weights LSS) +lm(formula = api00 ell + acs_k3 + avg_ed + meals, data = dvar)
- +
-Residuals: +
-   Min     1Q Median     3Q    Max  +
--6.055 -4.930 -2.925  4.886 11.854 +
  
 Coefficients: Coefficients:
-            Estimate Std. Error t value Pr(>|t|)     +(Intercept)          ell       acs_k3       avg_ed        meals   
-(Intercept)  103.099      7.543   13.67 9.53e-06 *** +   709.6388      -0.8434       3.3884      29.0724      -2.9374  
-LSS           12.440      1.097   11.34 2.81e-05 *** +
---- +
-Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1+
  
-Residual standard error: 7.026 on 6 degrees of freedom +></code> 
-Multiple R-squared:  0.9554, Adjusted R-squared:  0.948  +$$ \hat{Y} =  709.6388 + -0.8434 \text{ell} + 3.3884 \text{acs_k3} + 29.0724 \text{avg_ed} + -2.9374 \text{meals} \\$$ 
-F-statistic: 128.6 on 1 and 6 DF,  p-value: 2.814e-05+
  
- +그렇다면 각각의 독립변인 고유의 설명력은 얼마인가? --see [[:partial and semipartial correlation]]
-</code>+
  
  
Line 427: Line 420:
  
 |  | Standard Multiple   | Sequential    comments    |  | Standard Multiple   | Sequential    comments   
-| r<sub>i</sub><sup>2</sup>  \\ squared correlation \\ **zero-order** correlation   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | overlapped effects   +| r<sub>i</sub><sup>2</sup>  \\ squared correlation \\ squared **zero-order** \\ correlation in spss  | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | overlapped effects   
 | ::: | IV<sub>2</sub> : (c+b) / (a+b+c+d)   | IV<sub>2</sub>: (c+b) / (a+b+c+d)   | ::: |  | ::: | IV<sub>2</sub> : (c+b) / (a+b+c+d)   | IV<sub>2</sub>: (c+b) / (a+b+c+d)   | ::: | 
-| sr<sub>i</sub><sup>2</sup>  \\ squared **semipartial** correlation \\ **part in spss**   | IV<sub>1</sub> : (a) / (a+b+c+d)   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | Usual setting \\ Unique contribution to Y   +| sr<sub>i</sub><sup>2</sup>  \\ squared \\ **semipartial** correlation \\ **part in spss**   | IV<sub>1</sub> : (a) / (a+b+c+d)   | IV<sub>1</sub> : (a+b) / (a+b+c+d)   | Usual setting \\ Unique contribution to Y   
 | ::: | IV<sub>2</sub> : %%(c%%) / (a+b+c+d)   | IV<sub>2</sub> : %%(c%%) / (a+b+c+d)   | ::: |  | ::: | IV<sub>2</sub> : %%(c%%) / (a+b+c+d)   | IV<sub>2</sub> : %%(c%%) / (a+b+c+d)   | ::: | 
-| pr<sub>i</sub><sup>2</sup>  \\ squared **partial** correlation \\ **partial in spss**   | IV<sub>1</sub> : (a) / (a+d)   | IV<sub>1</sub> : (a+b) / (a+b+d)   | Like adjusted r<sup>2</sup>  \\ Unique contribution to Y   +| pr<sub>i</sub><sup>2</sup>  \\ squared \\ **partial** correlation \\ **partial in spss**   | IV<sub>1</sub> : (a) / (a+d)   | IV<sub>1</sub> : (a+b) / (a+b+d)   | Like adjusted r<sup>2</sup>  \\ Unique contribution to Y   
 | ::: | IV<sub>2</sub> : %%(c%%) / (c+d)   | IV<sub>2</sub> : %%(c%%) / (c+d)   | ::: |  | ::: | IV<sub>2</sub> : %%(c%%) / (c+d)   | IV<sub>2</sub> : %%(c%%) / (c+d)   | ::: | 
 | IV<sub>1</sub> 이 IV<sub>2</sub> 보다 먼저 투입되었을 때를 가정   ||||  | IV<sub>1</sub> 이 IV<sub>2</sub> 보다 먼저 투입되었을 때를 가정   |||| 
Line 454: Line 447:
 Multicolliearity problem = when torelance < .01 or when VIF > 10  Multicolliearity problem = when torelance < .01 or when VIF > 10 
  
 +====== elem e.g. again ======
 +<code>
 +dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv", fileEncoding="UTF-8-BOM")
 +mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar)
 +summary(mod)
 +anova(mod)
 +</code>
 +<code>
 +dvar <- read.csv("http://commres.net/wiki/_media/elemapi2_.csv", fileEncoding="UTF-8-BOM")
 +> mod <- lm(api00 ~ ell + acs_k3 + avg_ed + meals, data=dvar)
 +> summary(mod)
  
 +Call:
 +lm(formula = api00 ~ ell + acs_k3 + avg_ed + meals, data = dvar)
  
 +Residuals:
 +     Min       1Q   Median       3Q      Max 
 +-187.020  -40.358   -0.313   36.155  173.697 
 +
 +Coefficients:
 +            Estimate Std. Error t value Pr(>|t|)    
 +(Intercept) 709.6388    56.2401  12.618  < 2e-16 ***
 +ell          -0.8434     0.1958  -4.307 2.12e-05 ***
 +acs_k3        3.3884     2.3333   1.452    0.147    
 +avg_ed       29.0724     6.9243   4.199 3.36e-05 ***
 +meals        -2.9374     0.1948 -15.081  < 2e-16 ***
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 58.63 on 374 degrees of freedom
 +  (21 observations deleted due to missingness)
 +Multiple R-squared:  0.8326, Adjusted R-squared:  0.8308 
 +F-statistic:   465 on 4 and 374 DF,  p-value: < 2.2e-16
 +
 +> anova(mod)
 +Analysis of Variance Table
 +
 +Response: api00
 +           Df  Sum Sq Mean Sq  F value    Pr(>F)    
 +ell         1 4502711 4502711 1309.762 < 2.2e-16 ***
 +acs_k3      1  110211  110211   32.059 2.985e-08 ***
 +avg_ed      1  998892  998892  290.561 < 2.2e-16 ***
 +meals        781905  781905  227.443 < 2.2e-16 ***
 +Residuals 374 1285740    3438                       
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +</code>
 +<code>
 +# install.packages("ppcor")
 +library(ppcor)
 +myvar <- data.frame(api00, ell, acs_k3, avg_ed, meals)
 +myvar <- na.omit(myvar)
 +spcor(myvar)
 +</code>
 +
 +<code>
 +> library(ppcor)
 +> myvar <- data.frame(api00, ell, acs_k3, avg_ed, meals)
 +> myvar <- na.omit(myvar)
 +> spcor(myvar)
 +$estimate
 +             api00         ell      acs_k3      avg_ed      meals
 +api00   1.00000000 -0.09112026  0.03072660  0.08883450 -0.3190889
 +ell    -0.13469956  1.00000000  0.06086724 -0.06173591  0.1626061
 +acs_k3  0.07245527  0.09709299  1.00000000 -0.13288465 -0.1367842
 +avg_ed  0.12079565 -0.05678795 -0.07662825  1.00000000 -0.2028836
 +meals  -0.29972194  0.10332189 -0.05448629 -0.14014709  1.0000000
 +
 +$p.value
 +              api00        ell    acs_k3      avg_ed        meals
 +api00  0.000000e+00 0.07761805 0.5525340 0.085390280 2.403284e-10
 +ell    8.918743e-03 0.00000000 0.2390272 0.232377348 1.558141e-03
 +acs_k3 1.608778e-01 0.05998819 0.0000000 0.009891503 7.907183e-03
 +avg_ed 1.912418e-02 0.27203887 0.1380449 0.000000000 7.424903e-05
 +meals  3.041658e-09 0.04526574 0.2919775 0.006489783 0.000000e+00
 +
 +$statistic
 +           api00       ell     acs_k3    avg_ed     meals
 +api00   0.000000 -1.769543  0.5945048  1.724797 -6.511264
 +ell    -2.628924  0.000000  1.1793030 -1.196197  3.187069
 +acs_k3  1.404911  1.886603  0.0000000 -2.592862 -2.670380
 +avg_ed  2.353309 -1.100002 -1.4862899  0.000000 -4.006914
 +meals  -6.075665  2.008902 -1.0552823 -2.737331  0.000000
 +
 +$n
 +[1] 379
 +
 +$gp
 +[1] 3
 +
 +$method
 +[1] "pearson"
 +
 +
 +</code>
 +
 +<code>
 +> spcor.test(myvar$api00, myvar$meals, myvar[,c(2,3,4)])
 +    estimate      p.value statistic   n gp  Method
 +1 -0.3190889 2.403284e-10 -6.511264 379  3 pearson
 +
 +</code>
 ====== e.g., ====== ====== e.g., ======
 [[:multiple regression examples]] [[:multiple regression examples]]
Line 481: Line 575:
   * Income Income seven years after College (in thousands)   * Income Income seven years after College (in thousands)
  
 +====== exercise ======
 +{{:insurance.csv}}
 +<code>
 +dvar <- read.csv("http://commres.net/wiki/_media/insurance.csv")
 +</code>
 +
 +[[:Multiple Regression Exercise]]
  
 ====== Resources ====== ====== Resources ======
Line 502: Line 603:
   * https://www.r-bloggers.com/analysis-of-covariance-%E2%80%93-extending-simple-linear-regression/   * https://www.r-bloggers.com/analysis-of-covariance-%E2%80%93-extending-simple-linear-regression/
   * http://www.wekaleamstudios.co.uk/posts/analysis-of-covariance-extending-simple-linear-regression/   * http://www.wekaleamstudios.co.uk/posts/analysis-of-covariance-extending-simple-linear-regression/
 +
 +https://www.youtube.com/user/marinstatlectures/search?query=Multiple+Linear+Regression+
 +
 +
 {{tag> "research methods" "statistics" "regression" "multiple regression"}} {{tag> "research methods" "statistics" "regression" "multiple regression"}}
 +
multiple_regression.1558574411.txt.gz · Last modified: 2019/05/23 10:20 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki