adjusted_r_squared
This is an old revision of the document!
Adjusted R Squared
Adjusted R2 vs. R2
아래는 Regression의 E.g. 3 Simple Regression 예이다.
DATA | |
x | y |
1 | 1 |
2 | 1 |
3 | 2 |
4 | 2 |
5 | 4 |
Model Summary(b) | ||||
Model | R | R Square | Adjusted R Square | Std. Error of the Estimate |
1 | 0.903696114 | 0.816666667 | 0.755555556 | 0.605530071 |
r-square:
- $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = \frac{\text{Explained sample variability}}{\text{Total sample variability}}$
- $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = 1-\frac{SS_{res}}{SS_{total}} = 0.816666667 = R^2 $
- Usually interpret with % ( by multiplying 100 to $r^2$ )
Adjusted r-square:
- $\displaystyle r^2=\frac{SS_{total}-SS_{res}}{SS_{total}} = 1 - \frac{SS_{res}}{SS_{total}} $ ,
- This is equivalent to: $ \displaystyle 1 - \frac {Var_{rei}}{Var_{total}} $
- $\text{Var} = \text{MS} = s^{2} = \displaystyle \frac {SS}{n} $
- 여기서 n 대신에 각각 아래의 값을 사용한다면 (n = 샘플 숫자, p = 변인 숫자),
- $\displaystyle Var_{res} = \frac {SS_{res}}{n-p-1}$
- $\displaystyle Var_{total} = \frac {SS_{total}}{n-1}$
- 따라서,
- $\displaystyle \text{Adjusted } R^{2} = 1 - \displaystyle \frac {\displaystyle \frac {SS_{res}}{n-p-1}}{\displaystyle \frac {SS_{total}}{n-1}} $
- This is the same logic as we used n-1 instead of n in order to get estimation of population standard deviation with a sample statistics.
- Therefore, the Adjusted r2 = 0.755555556
왜 Adjusted R squared 값을 사용하는가?
If we take a look at the ANOVA result:
ANOVA | ||||||
---|---|---|---|---|---|---|
Model | Sum of Squares | df | Mean Square | F | Sig. | |
1 | Regression | 4.9 | 1 | 4.9 | 13.36363636 | 0.035352847 |
Residual | 1.1 | 3 | 0.366666667 | |||
Total | 6 | 4 | ||||
a Predictors: (Constant), x | ||||||
b Dependent Variable: y |
- ANOVA, F-test, $F=\frac{MS_{between}}{MS_{within}}$
- MS_between?
- MS_within?
- MS for residual
- $s = \sqrt{s^2} = \sqrt{\frac{SS_{res}}{n-2}} $
- random difference (MSwithin ): $s^2 = \frac{SS_{res}}{n-2} $
- MS for regression . . . Obtained difference
- do the same procedure at the above in MS for residual.
- but, this time degress of freedom is k-1 (number of variables -1 ), 1.
- Then what does F value mean?
Then, we take another look at coefficients result:
example | ||||||||
---|---|---|---|---|---|---|---|---|
Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | 95% Confidence Interval for B | |||
B | Std. Error | Beta | Lower Bound | Upper Bound | ||||
1 | (Constant) | -0.1 | 0.635085296 | -0.157459164 | 0.88488398 | -2.121124854 | 1.921124854 | |
x | 0.7 | 0.191485422 | 0.903696114 | 3.655630775 | 0.035352847 | 0.090607928 | 1.309392072 | |
a Dependent Variable: y |
- Why do we do t-test for the slope of X variable? The below is a mathematical explanation for this.
- Sampling distribution of Beta (혹은 b):
- $\sigma_{\beta_{1}} = \frac{\sigma}{\sqrt{SS_{xx}}}$
- estimation of $\sigma_{\beta_{1}}$ : substitute sigma with s
- t-test
- $t=\frac{\beta_{1} - \text{Hypothesized value of }\beta_{1}}{s_{\beta_{1}}}$
- Hypothesized value of beta 값은 대개 0. 따라서 t 값은
- $t=\frac{\beta_{1}}{s_{\beta_{1}}}$
- $s_{\beta} = \frac {MS_{E}}{SS_{X}} = \display\frac{\sqrt{\frac{SSE}{n-2}}}{\sqrt{SS_{X}}} = \display\frac{\sqrt{\frac{\Sigma{(Y-\hat{Y})^2}}{n-2}}}{\sqrt{\Sigma{(X_{i}-\bar{X})^2}}} $
X | Y | $X-\bar{X}$ | ssx | sp | ypredicted | error | error2 |
---|---|---|---|---|---|---|---|
1 | 1 | -2 | 4 | 2 | 0.6 | -0.4 | 0.16 |
2 | 1 | -1 | 1 | 1 | 1.3 | 0.3 | 0.09 |
3 | 2 | 0 | 0 | 0 | 2 | 0 | 0 |
4 | 2 | 1 | 1 | 0 | 2.7 | 0.7 | 0.49 |
5 | 4 | 2 | 4 | 4 | 3.4 | -0.6 | 0.36 |
$\bar{X}$ = 3 | 2 | SSX = 10 | $\Sigma$ = 7 | SSE = 1.1 |
Regression formula: ypredicted = -0.1 + 0.7 X
SSE = Sum of Square Error
기울기 beta(b)에 대한 표준오차값은 아래와 같이 구한다.
$$se_{\beta} = \frac {\sqrt{SSE/n-2}}{\sqrt{SSX}} \\
& = & \frac {\sqrt{1.1/3}}{\sqrt{10}} = 0.191485 $$
그리고 b = 0.7
따라서 t = b / se = 3.655631
adjusted_r_squared.1462921528.txt.gz · Last modified: 2016/05/11 07:35 by hkimscil