User Tools

Site Tools


adjusted_r_squared

This is an old revision of the document!


Adjusted R Squared

Adjusted R2 vs. R2

아래는 RegressionE.g. 3 Simple Regression 예이다.

DATA
x y
1 1
2 1
3 2
4 2
5 4
Model Summary(b)
Model R R
Square
Adjusted
R Square
Std. Error of
the Estimate
1 0.903696114 0.816666667 0.755555556 0.605530071

r-square:

  • $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = \frac{\text{Explained sample variability}}{\text{Total sample variability}}$
  • $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = 1-\frac{SS_{res}}{SS_{total}} = 0.816666667 = R^2 $
  • Usually interpret with % ( by multiplying 100 to $r^2$ )

Adjusted r-square:

  • $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = 1 - \frac{SS_{res}}{SS_{total}} $ ,
  • This is equivalent to: $ \displaystyle 1 - \frac {Var_{rei}}{Var_{total}} $
  • $\text{Var} = \text{MS} = s^{2} = \displaystyle \frac {SS}{n} $
  • 여기서 n 대신에 각각 아래의 값을 사용한다면 (n = 샘플 숫자, p = 변인 숫자),
    • $\displaystyle Var_{res} = \frac {SS_{res}}{n-p-1}$
    • $\displaystyle Var_{total} = \frac {SS_{total}}{n-1}$
  • 따라서,
    • $\displaystyle \text{Adjusted } R^{2} = 1 - \displaystyle \frac {\displaystyle \frac {SS_{res}}{n-p-1}}{\displaystyle \frac {SS_{total}}{n-1}} $
  • This is the same logic as we used n-1 instead of n in order to get estimation of population standard deviation with a sample statistics.
  • Therefore, the Adjusted r2 = 0.755555556

왜 Adjusted R squared 값을 사용하는가?

If we take a look at the ANOVA result:

ANOVA
Model Sum of Squares df Mean Square F Sig.
1 Regression 4.9 1 4.9 13.36363636 0.035352847
Residual 1.1 3 0.366666667
Total 6 4
a Predictors: (Constant), x
b Dependent Variable: y
  • ANOVA, F-test, $F=\frac{MS_{between}}{MS_{within}}$
  • MS_between?
  • MS_within?
  • MS for residual
  • $s = \sqrt{s^2} = \sqrt{\frac{SS_{res}}{n-2}} $
  • random difference (MSwithin ): $s^2 = \frac{SS_{res}}{n-2} $
  • MS for regression . . . Obtained difference
  • do the same procedure at the above in MS for residual.
  • but, this time degress of freedom is k-1 (number of variables -1 ), 1.
  • Then what does F value mean?

Then, we take another look at coefficients result:

example
Model Unstandardized Coefficients Standardized Coefficients t Sig. 95% Confidence Interval for B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) -0.1 0.635085296 -0.157459164 0.88488398 -2.121124854 1.921124854
x 0.7 0.191485422 0.903696114 3.655630775 0.035352847 0.090607928 1.309392072
a Dependent Variable: y
  • Why do we do t-test for the slope of X variable? The below is a mathematical explanation for this.
  • Sampling distribution of Beta (혹은 b):
  • $\sigma_{\beta_{1}} = \frac{\sigma}{\sqrt{SS_{xx}}}$
  • estimation of $\sigma_{\beta_{1}}$ : substitute sigma with s
  • t-test
  • $t=\frac{\beta_{1} - \text{Hypothesized value of }\beta_{1}}{s_{\beta_{1}}}$
  • Hypothesized value of beta 값은 대개 0. 따라서 t 값은
  • $t=\frac{\beta_{1}}{s_{\beta_{1}}}$
  • $s_{\beta} = \frac {MS_{E}}{SS_{X}} = \display\frac{\sqrt{\frac{SSE}{n-2}}}{\sqrt{SS_{X}}} = \display\frac{\sqrt{\frac{\Sigma{(Y-\hat{Y})^2}}{n-2}}}{\sqrt{\Sigma{(X_{i}-\bar{X})^2}}} $
X Y $X-\bar{X}$ ssx sp ypredicted error error2
1 1 -2 4 2 0.6 -0.4 0.16
2 1 -1 1 1 1.3 0.3 0.09
3 2 0 0 0 2 0 0
4 2 1 1 0 2.7 0.7 0.49
5 4 2 4 4 3.4 -0.6 0.36
$\bar{X}$ = 3 2 SSX = 10 $\Sigma$ = 7 SSE = 1.1

Regression formula: ypredicted = -0.1 + 0.7 X
SSE = Sum of Square Error
기울기 beta(b)에 대한 표준오차값은 아래와 같이 구한다.
$$se_{\beta} = \frac {\sqrt{SSE/n-2}}{\sqrt{SSX}} \\ & = & \frac {\sqrt{1.1/3}}{\sqrt{10}} = 0.191485 $$
그리고 b = 0.7
따라서 t = b / se = 3.655631

adjusted_r_squared.1462921528.txt.gz · Last modified: 2016/05/11 07:35 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki