User Tools

Site Tools


regression

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
regression [2022/10/23 12:23] – [E.g., 3. Simple regression: Adjusted R squared & Slope test] hkimscilregression [2023/05/24 08:53] (current) – [Slope test] hkimscil
Line 283: Line 283:
 ^  ANOVA(b)  ^^^^^^^ ^  ANOVA(b)  ^^^^^^^
 |  Model        |  Sum of Squares    df    Mean Square    F    Sig.  |  |  Model        |  Sum of Squares    df    Mean Square    F    Sig.  | 
-|  1.000    |  Regression   | @white: 18.934    | @blue: 1.000    |  18.934    |  13.687    |  0.006   |  +|  1.000    |  Regression   | @white: 18.934    | @lightblue: 1.000    |  18.934    |  13.687    |  0.006   |  
-|      Residual   | @orange: 11.066    | @green: 8.000    |  1.383*    |        | +|      Residual   | @orange: 11.066    | @lightgreen: 8.000    |  1.383*    |        | 
 |      Total   | @yellow: 30.000    | @#eee: 9.000    |            |  |      Total   | @yellow: 30.000    | @#eee: 9.000    |            | 
 | a Predictors: (Constant), bankIncome  income \\ b Dependent Variable: bankbook  number of bank  |||||||   | a Predictors: (Constant), bankIncome  income \\ b Dependent Variable: bankbook  number of bank  |||||||  
  
-* 1.383 = SS<sub>res</sub> / n-2 = standard error 표준오차  +  * 1.383 = SS<sub>res</sub> / n-2 = MS residual = MS error (error due to random chance)   
-  * standard error = 표준오차는 [[:t-test]]를 배울 때의 t = 차이/se 에서와 같은 의미 +  * 이 MS error 는 [[:t-test]]를 배울 때의 t = 차이/se 에서 se와 같은 의미 
-  * 따라서 MS<sub>regression</sub>인 18.934 를 표준오차로 나눈 값을 F 값이라고 부.+  * 그리고 MS<sub>regression</sub>인 18.934 를 MS error 혹은 MS residual 로 나눈 값을 F 값이라고 부르고 
 +  * 이에 대한 통계학 검증을 한다 (df regression = 2 - 1; df residual = n - 2; df total = n - 1) 
 <WRAP clear /> <WRAP clear />
  
Line 301: Line 302:
  
 | for SS   | for degrees of freedom   | | for SS   | for degrees of freedom   |
-| @white: white \\ = explained error (E) \\ = $SS{reg}$  | @blue: for regression \\ (number of variable -1) \\ = 1 (blue) | +| @white: white \\ = explained error (E) \\ = $SS{reg}$  | @lightblue: for regression \\ (number of variable -1) \\ = 1 (light blue) | 
-| @orange: orange \\ = unexplained error (U) \\ = $SS{res}$  | @green: for residual \\ (number of case - number of variable) \\ = 8 (green) |+| @orange: orange \\ = unexplained error (U) \\ = $SS{res}$  | @lightgreen: for residual \\ (number of case - number of variable) \\ = 8 (green) |
 | @yellow: yellow \\ = total error $SS_{total}$ \\ = E + U \\ = $SS_{reg} + SS_{res}$ | @#eee: grey \\ = total df \\ = total sample # -1  |  | @yellow: yellow \\ = total error $SS_{total}$ \\ = E + U \\ = $SS_{reg} + SS_{res}$ | @#eee: grey \\ = total df \\ = total sample # -1  | 
  
Line 662: Line 663:
  
  
-**__r-square:__**+===== r-square =====
   * $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = \frac{\text{Explained sample variability}}{\text{Total sample variability}}$   * $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = \frac{\text{Explained sample variability}}{\text{Total sample variability}}$
  
Line 671: Line 672:
  
  
-**__Adjusted r-square:__** +===== Adjusted r-square =====
   * $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = 1 - \frac{SS_{res}}{SS_{total}} $ ,   * $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = 1 - \frac{SS_{res}}{SS_{total}} $ ,
  
Line 692: Line 693:
       * R2 value goes down -- which means        * R2 value goes down -- which means 
       * more (many) IVs is not always good       * more (many) IVs is not always good
-  * Therefore, the Adjusted r<sup>2</sup> = .367 / 1.5 = 0.756 (green color cell)+  * Therefore, the Adjusted r<sup>2</sup>1- (.367 / 1.5= 0.756 (green color cell)
  
-**__Slope test__**+===== Slope test =====
 If we take a look at the ANOVA result: If we take a look at the ANOVA result:
  
Line 705: Line 706:
 | b Dependent Variable: y    ||||||| | b Dependent Variable: y    |||||||
 <WRAP clear /> <WRAP clear />
 +F test recap. 
   * ANOVA, F-test, $F=\frac{MS_{between}}{MS_{within}}$   * ANOVA, F-test, $F=\frac{MS_{between}}{MS_{within}}$
-  * MS_between? +    * MS_between? 
-  * MS_within? +    * MS_within? 
-  * MS for residual +  * regression에서 within 에 해당하는 것 == residual 
    * $s = \sqrt{s^2} = \sqrt{\frac{SS_{res}}{n-2}} $    * $s = \sqrt{s^2} = \sqrt{\frac{SS_{res}}{n-2}} $
-   * random difference (MS<sub>within</sub> ): $s^2 = \frac{SS_{res}}{n-2} $ +   왜냐하면 이 ss residual이 random difference 를 말하는 것이므로 (MS<sub>within</sub> ): $s^2 = \frac{SS_{res}}{n-2} $ 
   * MS for regression . . . Obtained difference   * MS for regression . . . Obtained difference
    * do the same procedure at the above in MS for <del>residual</del> regression.    * do the same procedure at the above in MS for <del>residual</del> regression.
Line 728: Line 729:
  
   * Why do we do t-test for the slope of X variable? The below is a mathematical explanation for this.     * Why do we do t-test for the slope of X variable? The below is a mathematical explanation for this.  
-  * Sampling distribution of b: +  * Sampling distribution of error around the slope line b:
    * $\displaystyle \sigma_{b_{1}} = \frac{\sigma}{\sqrt{SS_{x}}}$    * $\displaystyle \sigma_{b_{1}} = \frac{\sigma}{\sqrt{SS_{x}}}$
 +     * We remember that $\displaystyle \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$ ?
    * estimation of $\sigma_{b_{1}}$ : substitute sigma with s    * estimation of $\sigma_{b_{1}}$ : substitute sigma with s
  
 +만약에 error들이 (residual들) slope b를 중심으로 포진해 있고, 이것을 따로 떼어내서 distribution curve를 그려보면 평균이 0이고 standard deviation이 위의 standard error값을 갖는 normal distribution을 이루게 될 것이다. 
   * t-test   * t-test
- 
    * $\displaystyle t=\frac{b_{1} - \text{Hypothesized value of }\beta_{1}}{s_{b_{1}}}$    * $\displaystyle t=\frac{b_{1} - \text{Hypothesized value of }\beta_{1}}{s_{b_{1}}}$
 +   * Hypothesized value of b 값은 (혹은 beta) 0. 따라서 t 값은
 +   * $\displaystyle t=\frac{b_{1}}{s_{b_{1}}}$
 +   * 기울기에 대한 표준오차는 (se) 아래와 같이 구한다
  
-   Hypothesized value of beta 값은 대개 0. 따라서 t 값은 +\begin{eqnarray*} 
- +\displaystyle s_{b_{1}} & & \sqrt {\frac {MSE}{SS_{X}}} \\ 
-   * $\displaystyle t=\frac{b_{1}}{s_{b_{1}}}$+ & = & \displaystyle \sqrt { \frac{1}{n-2* \frac{SSE}{SS_{X}}} \\  
 + & = & \displaystyle \sqrt { \frac{1}{n-2* \frac{ \Sigma{(Y-\hat{Y})^2} }{ \Sigma{ (X_{i} - \bar{X})^2 } } } \\ 
 +\end{eqnarray*}
  
-   * $\displaystyle s_{b_{1}} = \frac {MSE}{SS_{X}} = \frac{\sqrt{\frac{SSE}{n-2}}}{\sqrt{SS_{X}}} = \displaystyle \frac{\sqrt{\frac{\Sigma{(Y-\hat{Y})^2}}{n-2}}}{\sqrt{\Sigma{(X_{i}-\bar{X})^2}}} $ 
  
 ^ X  ^ Y  ^ $X-\bar{X}$  ^ ssx  ^ sp  ^ y<sub>predicted</sub>  ^ error  ^ error<sup>2</sup>  ^ ^ X  ^ Y  ^ $X-\bar{X}$  ^ ssx  ^ sp  ^ y<sub>predicted</sub>  ^ error  ^ error<sup>2</sup>  ^
Line 753: Line 757:
  
 Regression formula: y<sub>predicted</sub> = -0.1 + 0.7 X  Regression formula: y<sub>predicted</sub> = -0.1 + 0.7 X 
-SSE = Sum of Square Error+SSE = Sum of Square Error = SS_residual
 기울기 beta(b)에 대한 표준오차값은 아래와 같이 구한다.  기울기 beta(b)에 대한 표준오차값은 아래와 같이 구한다. 
 +
 \begin{eqnarray*} \begin{eqnarray*}
 se_{\beta} & = & \frac {\sqrt{SSE/n-2}}{\sqrt{SSX}} \\ se_{\beta} & = & \frac {\sqrt{SSE/n-2}}{\sqrt{SSX}} \\
Line 763: Line 768:
 따라서 t = b / se = 3.655631 따라서 t = b / se = 3.655631
  
-<code>x <- c(1, 2, 3, 4, 5) 
-y <- c(1, 1, 2, 2, 4) 
-mody <- lm(y ~ x)  
-</code> 
  
-<code> 
-> x <- c(1, 2, 3, 4, 5) 
-> y <- c(1, 1, 2, 2, 4) 
-> mody <- lm(y ~ x)  
-> summary(mody) 
- 
-Call: 
-lm(formula = y ~ x) 
- 
-Residuals: 
-                  2          3          4          5  
- 4.000e-01 -3.000e-01 -3.886e-16 -7.000e-01  6.000e-01  
- 
-Coefficients: 
-            Estimate Std. Error t value Pr(>|t|)   
-(Intercept)  -0.1000     0.6351  -0.157   0.8849   
-x             0.7000     0.1915   3.656   0.0354 * 
---- 
-Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
- 
-Residual standard error: 0.6055 on 3 degrees of freedom 
-Multiple R-squared:  0.8167, Adjusted R-squared:  0.7556  
-F-statistic: 13.36 on 1 and 3 DF,  p-value: 0.03535 
- 
- 
-</code> 
 ====== E.g., 4. Simple regression ====== ====== E.g., 4. Simple regression ======
 Another example of simple regression: from {{:elemapi.sav}} \\ Another example of simple regression: from {{:elemapi.sav}} \\
regression.1666495389.txt.gz · Last modified: 2022/10/23 12:23 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki