Differences

This shows you the differences between two versions of the page.

--- regression [2018/11/09 07:54] – [E.g., 1. Simple regression & F-test for goodness of fit] hkimscil
+++ regression [2019/05/20 08:40] – [e.g. Simple Regression] hkimscil
@@ Line 187: / Line 187: @@
 ^  __ prediction for y values with__ $\overline{Y}$  ^^^
-| bankaccount   | error   | error<sup>2</sup>  |
+| bankaccount   | prediction  | error   | error<sup>2</sup>  |
-| 6   | -2   | 4  |
+| 6   | 8  | -2   | 4  |
-| 5   | -3   | 9  |
+| 5   | 8  | -3   | 9  |
-| 7   | -1   | 1  |
+| 7   | 8  | -1   | 1  |
-| 7   | -1   | 1  |
+| 7   | 8  | -1   | 1  |
-| 8   | 0   | 0  |
+| 8   | 8  | 0   | 0  |
-| 10   | 2   | 4  |
+| 10   | 8  | 2   | 4  |
-| 8   | 0   | 0  |
+| 8   | 8  | 0   | 0  |
-| 11   | 3   | 9  |
+| 11   | 8  | 3   | 9  |
-| 9   | 1   | 1  |
+| 9   | 8  | 1   | 1  |
-| 9   | 1   | 1  |
+| 9   | 8  | 1   | 1  |
-|  $\overline{Y}=8$   |    |  $SS_{total} = 30$   |
+|  $\overline{Y}=8$   |   |   |  $SS_{total} = 30$   |
 <WRAP clear />
 위에서 제곱한 값의 합은? 30이다. 이는 사실, SS (Sum of Square)값이 30이라는 이야기이다. 그리고, 위에서 설명한 것처럼, 이 값은 $ SS_{total} $ 이라고 할 수 있으며 __전체에러__ 변량이라고 할 수 있겠다.
@@ Line 204: / Line 204: @@
 __SS<sub>res</sub> , Residual error__
 <code>
+> head(datavar)
+. . . .
+> mod <- lm(bankaccount ~ income, data = datavar)
+> summary(mod)
 Residuals:
     Min      1Q  Median      3Q     Max
@@ Line 281: / Line 286: @@
 |  Model   |      |  Sum of Squares   |  df   |  Mean Square   |  F   |  Sig.  |
 |  1.000    |  Regression   | @white: 18.934    | @grey: 1.000    |  18.934    |  13.687    |  0.006   |
-|     |  Residual   | @orange: 11.066    | @green: 8.000    |  1.383    |     |    |
+|     |  Residual   | @orange: 11.066    | @green: 8.000    |  1.383*    |     |    |
 |     |  Total   | @yellow: 30.000    |  9.000    |     |     |    |
 | a Predictors: (Constant), bankIncome  income \\ b Dependent Variable: bankbook  number of bank  |||||||
+* 1.383 = SS<sub>res</sub> / n-2 = standard error 표준오차
+  * standard error = 표준오차는 [[:t-test]]를 배울 때의 t = 차이/se 에서와 같은 의미
+  * 따라서 MS<sub>regression</sub>인 18.934 를 표준오차로 나눈 값을 F 값이라고 부른다.
 <WRAP clear />
 __ SS<sub>total</sub> SS<sub>reg</sub> SS<sub>res</sub> 를 이용한 F-test__
@@ Line 341: / Line 351: @@
-====== E.g., 2. Simple regression ======
+====== E.g., Simple regression ======
 data:
 {{:acidity.sav}} \\
@@ Line 579: / Line 589: @@
  r<sup>2</sup> = SS<sub>reg</sub> / SS<sub>total</sub> = 42.462 / 87.733 = .484.
+====== e.g. Simple Regression ======
+{{:AllenMursau.data.csv}}
+<code>datavar <- read.csv("http://commres.net/wiki/_media/allenmursau.data.csv")
+</code>
+<code>> mod <- lm(Y ~ X, data=datavar)
+> summary(mod)
+Call:
+lm(formula = Y ~ X, data = datavar)
+Residuals:
+    Min      1Q  Median      3Q     Max
+-250.22 -132.28   33.09  165.53  187.78
+Coefficients:
+            Estimate Std. Error t value Pr(>|t|)
+(Intercept)  300.976    229.754   1.310    0.219
+X             10.312      3.124   3.301    0.008 **
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+Residual standard error: 170.5 on 10 degrees of freedom
+Multiple R-squared:  0.5214,	Adjusted R-squared:  0.4736
+F-statistic:  10.9 on 1 and 10 DF,  p-value: 0.008002
+</code>
+<code>> anova(mod)
+Analysis of Variance Table
+Response: Y
+          Df Sum Sq Mean Sq F value   Pr(>F)
+X          1 316874  316874  10.896 0.008002 **
+Residuals 10 290824   29082
+---
+Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
+> </code>
 ====== E.g., 3. Simple regression: Adjusted R squared & Slope test ======
 This is another example of regression. Here the concept of adjusted r square is explained.