Table of Contents
Statistical Regression Methods
A part of selection method in multiple regression. Inshort,
Multiple Regression
- Enter method
- Selection method
- Hierarchical regression method Sequential regression method
- Statistical regression method
- forward selection: 인들 (predictors) 중 종속변인인 Y와 상관관계가 가장 높은 변인부터 먼저 투입되어 회귀계산이 수행된다. 먼저 투입된 변인은 (상관관계가 높으므로) 이론적으로 종속변인을 설명하는 중요한 요소로 여겨지게 된다. 또한 다음 변인은 우선 투입된 변인을 고려한 상태로 투입된다.
- backward deletion: 모든 독립변인들이 한꺼번에 투입되어 회귀계산이 시작된다. 이어서 회귀식에 통계학적으로 기여하지 못한다고 판단되는 X변인이 하나씩 제거되면서 회귀계산을 반복적으로 한다.
- stepwise selection: Forward와 같은 방식으로 회귀계산을 하되, 투입된 변인의 설명력을 계산하여 버릴 것인지 취할 것인지를 결정한다. 각 IV에 대한 t-test를 근거로 그 IV가 significant한 기여를 했는지를 판단하는 것을 말한다.
혹은 아래와 같이 분류하기도 한다
Multiple Regression
- Enter method
- Selection method
- Hierarchical regression
- Stepwise regression – 이 경우 stepwise 방식은 컴퓨터를 이용하여 criteria를 정하여 독립변인을 고르는 방법을 말한다.
- forward
- backward
- both direction
See also
See also Stepwise regression in NCSS site
The below is from http://www.statisticssolutions.com/selection-process-for-multiple-regression/
Forward selection begins with an empty equation. Predictors are added one at a time beginning with the predictor with the highest correlation with the dependent variable. Variables of greater theoretical importance are entered first. Once in the equation, the variable remains there.
Backward elimination (or backward deletion) is the reverse process. All the independent variables are entered into the equation first and each one is deleted one at a time if they do not contribute to the regression equation.
Stepwise regression is a combination of the forward and backward selection techniques. . . . Stepwise regression is a modification of the forward selection so that after each step in which a variable was added, all candidate variables in the model are checked to see if their significance has been reduced below the specified tolerance level. If a nonsignificant variable is found, it is removed from the model. Stepwise regression requires two significance levels: one for adding variables and one for removing variables. The cutoff probability for adding variables should be less than the cutoff probability for removing variables so that the
procedure does not get into an infinite loop.
Sequential Regression Method of Entry:
Block-wise selection is a version of forward selection that is achieved in blocks or sets. The predictors are grouped into blocks based on psychometric consideration or theoretical reasons and a stepwise selection is applied. Each block is applied separately while the other predictor variables are ignored. Variables can be removed when they do not contribute to the prediction. In general, the predictors included in the blocks will be inter-correlated. Also, the order of entry has an impact on which variables will be selected; those that are entered in the earlier stages have a better chance of being retained than those entered at later stages.
Essentially, the multiple regression selection process enables the researcher to obtain a reduced set of variables from a larger set of predictors, eliminating unnecessary predictors, simplifying data, and enhancing predictive accuracy.
Two criterion are used to achieve the best set of predictors; these include meaningfulness to the situation and statistical significance. By entering variables into the equation in a given order, confounding variables can be investigated and variables that are highly correlated can be combined into blocks.
R에서 statistical regression방법은 step
명령어를 이용하여 할 수 있다.
step()에서 AIC 값이 더이상 작아지지 않는 모형을 최종적으로 선택한다.
e.g. 1
backward elimination
lowbwt.csv read lowbwt dataset or see https://notendur.hi.is/birgirhr/lowbwt.txt
lbw <- read.csv("http://commres.net/wiki/_media/r/lowbwt.csv", sep=",") names(lbw) <- tolower(names(lbw))
## Recoding lbw <- within(lbw, { ## race relabeling race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## ftv (frequency of visit) relabeling ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+")) })
lm.full <- lm(bwt ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, data = lbw) lm.null <- lm(bwt ~ 1, data = lbw)
> step(m.full, direction="both") Start: AIC=1877.43 bwt ~ id + low + age + lwt + race + smoke + ptl + ht + ui + ftv + preterm + ftv.cat + race.cat Step: AIC=1877.43 bwt ~ id + low + age + lwt + smoke + ptl + ht + ui + ftv + preterm + ftv.cat + race.cat Df Sum of Sq RSS AIC - preterm 1 21 3323378 1875.4 - ptl 1 855 3324211 1875.5 - ftv.cat 2 36489 3359845 1875.5 - age 1 1417 3324774 1875.5 - smoke 1 2016 3325372 1875.5 - race.cat 2 38236 3361593 1875.6 - ftv 1 27686 3351043 1877.0 - lwt 1 30065 3353422 1877.1 <none> 3323356 1877.4 - ui 1 42573 3365930 1877.8 - low 1 50278 3373635 1878.3 - ht 1 129875 3453231 1882.7 - id 1 29802422 33125778 2310.0 Step: AIC=1875.43 bwt ~ id + low + age + lwt + smoke + ptl + ht + ui + ftv + ftv.cat + race.cat Df Sum of Sq RSS AIC - ftv.cat 2 36536 3359914 1873.5 - age 1 1463 3324840 1873.5 - smoke 1 2050 3325428 1873.5 - race.cat 2 38790 3362167 1873.6 - ptl 1 6424 3329802 1873.8 - ftv 1 27666 3351044 1875.0 - lwt 1 30250 3353628 1875.1 <none> 3323378 1875.4 - ui 1 42696 3366074 1875.8 - low 1 50815 3374192 1876.3 + preterm 1 21 3323356 1877.4 - ht 1 129854 3453231 1880.7 - id 1 29828678 33152056 2308.2 Step: AIC=1873.49 bwt ~ id + low + age + lwt + smoke + ptl + ht + ui + ftv + race.cat Df Sum of Sq RSS AIC - age 1 612 3360526 1871.5 - ftv 1 1077 3360991 1871.5 - ptl 1 4035 3363949 1871.7 - smoke 1 7262 3367176 1871.9 - race.cat 2 60911 3420825 1872.9 <none> 3359914 1873.5 - ui 1 48125 3408039 1874.2 - low 1 50652 3410566 1874.3 - lwt 1 52277 3412191 1874.4 + ftv.cat 2 36536 3323378 1875.4 + preterm 1 68 3359845 1875.5 - ht 1 115086 3475000 1877.9 - id 1 29807178 33167091 2304.2 Step: AIC=1871.53 bwt ~ id + low + lwt + smoke + ptl + ht + ui + ftv + race.cat Df Sum of Sq RSS AIC - ftv 1 835 3361361 1869.6 - ptl 1 4613 3365139 1869.8 - smoke 1 6898 3367424 1869.9 - race.cat 2 60299 3420825 1870.9 <none> 3360526 1871.5 - ui 1 47568 3408095 1872.2 - low 1 52254 3412780 1872.4 - lwt 1 55901 3416427 1872.7 + age 1 612 3359914 1873.5 + ftv.cat 2 35686 3324840 1873.5 + preterm 1 32 3360494 1873.5 - ht 1 114704 3475230 1875.9 - id 1 30083255 33443781 2303.8 Step: AIC=1869.57 bwt ~ id + low + lwt + smoke + ptl + ht + ui + race.cat Df Sum of Sq RSS AIC - ptl 1 4631 3365992 1867.8 - smoke 1 7150 3368510 1868.0 - race.cat 2 61709 3423070 1869.0 <none> 3361361 1869.6 - ui 1 48299 3409660 1870.3 - low 1 51933 3413294 1870.5 - lwt 1 55081 3416442 1870.7 + ftv 1 835 3360526 1871.5 + age 1 370 3360991 1871.5 + preterm 1 65 3361296 1871.6 + ftv.cat 2 8791 3352570 1873.1 - ht 1 118142 3479503 1874.1 - id 1 30120786 33482147 2302.0 Step: AIC=1867.83 bwt ~ id + low + lwt + smoke + ht + ui + race.cat Df Sum of Sq RSS AIC - smoke 1 9448 3375440 1866.4 - race.cat 2 63075 3429067 1867.3 <none> 3365992 1867.8 - low 1 48510 3414502 1868.5 - lwt 1 52839 3418831 1868.8 - ui 1 56774 3422766 1869.0 + ptl 1 4631 3361361 1869.6 + preterm 1 3443 3362549 1869.6 + ftv 1 853 3365139 1869.8 + age 1 833 3365159 1869.8 + ftv.cat 2 8311 3357682 1871.4 - ht 1 118424 3484417 1872.4 - id 1 30313012 33679005 2301.1 Step: AIC=1866.36 bwt ~ id + low + lwt + ht + ui + race.cat Df Sum of Sq RSS AIC - race.cat 2 53642 3429082 1865.3 <none> 3375440 1866.4 - low 1 48366 3423806 1867.0 - lwt 1 50518 3425958 1867.2 - ui 1 57207 3432647 1867.5 + smoke 1 9448 3365992 1867.8 + ptl 1 6929 3368510 1868.0 + preterm 1 5425 3370015 1868.1 + ftv 1 1158 3374282 1868.3 + age 1 401 3375039 1868.3 + ftv.cat 2 10879 3364561 1869.8 - ht 1 118943 3494383 1870.9 - id 1 31159450 34534890 2303.9 Step: AIC=1865.34 bwt ~ id + low + lwt + ht + ui Df Sum of Sq RSS AIC + race 1 47716 3381366 1864.7 - lwt 1 33877 3462959 1865.2 <none> 3429082 1865.3 - low 1 49598 3478680 1866.1 + race.cat 2 53642 3375440 1866.4 - ui 1 58727 3487809 1866.5 + ptl 1 5932 3423150 1867.0 + preterm 1 5360 3423723 1867.0 + ftv 1 2306 3426776 1867.2 + smoke 1 16 3429067 1867.3 + age 1 1 3429081 1867.3 + ftv.cat 2 18489 3410593 1868.3 - ht 1 125091 3554174 1870.1 - id 1 32067628 35496710 2305.1 Step: AIC=1864.7 bwt ~ id + low + lwt + ht + ui + race Df Sum of Sq RSS AIC <none> 3381366 1864.7 - lwt 1 45021 3426387 1865.2 - race 1 47716 3429082 1865.3 - low 1 47807 3429173 1865.3 - ui 1 59260 3440626 1866.0 + smoke 1 9418 3371948 1866.2 + ptl 1 7086 3374280 1866.3 + race.cat 1 5927 3375440 1866.4 + preterm 1 5490 3375877 1866.4 + ftv 1 1056 3380310 1866.6 + age 1 1044 3380323 1866.6 + ftv.cat 2 10958 3370409 1868.1 - ht 1 119185 3500552 1869.2 - id 1 31511908 34893274 2303.8 Call: lm(formula = bwt ~ id + low + lwt + ht + ui + race, data = lbw) Coefficients: (Intercept) id low lwt ht 1647.3705 11.5379 59.4079 -0.5444 -109.0059 ui race -52.5856 -17.8063 >
summary(lm.full)
Call: lm(formula = bwt ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, data = lbw) Residuals: Min 1Q Median 3Q Max -1896.38 -445.54 53.58 466.07 1654.74 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2949.808 320.517 9.203 < 2e-16 *** age -2.928 9.674 -0.303 0.762483 lwt 4.205 1.717 2.448 0.015316 * race.catBlack -467.043 149.797 -3.118 0.002125 ** race.catOther -323.144 117.411 -2.752 0.006532 ** smoke -307.880 109.148 -2.821 0.005335 ** preterm1+ -207.757 136.364 -1.524 0.129394 ht -568.111 200.905 -2.828 0.005225 ** ui -494.168 137.246 -3.601 0.000412 *** ftv.catNone -55.975 105.373 -0.531 0.595934 ftv.catMany -185.275 203.215 -0.912 0.363151 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 646.9 on 178 degrees of freedom Multiple R-squared: 0.2544, Adjusted R-squared: 0.2125 F-statistic: 6.074 on 10 and 178 DF, p-value: 6.27e-08
drop1(lm.full, test = "F")
Single term deletions Model: bwt ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat Df Sum of Sq RSS AIC F value Pr(>F) <none> 74494960 2457.2 age 1 38343 74533303 2455.3 0.0916 0.7624834 lwt 1 2508944 77003904 2461.4 5.9949 0.0153165 * race.cat 2 5560980 80055939 2466.8 6.6438 0.0016492 ** smoke 1 3329939 77824899 2463.4 7.9566 0.0053352 ** preterm 1 971457 75466416 2457.6 2.3212 0.1293944 ht 1 3346518 77841478 2463.5 7.9962 0.0052247 ** ui 1 5425727 79920686 2468.5 12.9644 0.0004115 *** ftv.cat 2 380072 74875032 2454.1 0.4541 0.6357678 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
drop1(update(lm.full, ~ . -age), test = "F")
Single term deletions Model: bwt ~ lwt + race.cat + smoke + preterm + ht + ui + ftv.cat Df Sum of Sq RSS AIC F value Pr(>F) <none> 74533303 2455 lwt 1 2483344 77016647 2459 5.96 0.01557 * race.cat 2 5607620 80140923 2465 6.73 0.00151 ** smoke 1 3295772 77829075 2461 7.92 0.00545 ** preterm 1 1052971 75586274 2456 2.53 0.11355 ht 1 3323302 77856605 2462 7.98 0.00526 ** ui 1 5390566 79923869 2466 12.95 0.00041 *** ftv.cat 2 369667 74902970 2452 0.44 0.64224 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Now preterm is least significat at p = 0.12. drop1(update(lm.full, ~ . -age -ftv.cat), test = "F")
Single term deletions Model: bwt ~ lwt + race.cat + smoke + preterm + ht + ui Df Sum of Sq RSS AIC F value Pr(>F) <none> 74902970 2452 lwt 1 2413556 77316526 2456 5.83 0.01673 * race.cat 2 6248590 81151560 2463 7.55 0.00071 *** smoke 1 3933172 78836142 2460 9.50 0.00237 ** preterm 1 1008759 75911729 2453 2.44 0.12020 ht 1 3440574 78343544 2459 8.31 0.00441 ** ui 1 5376658 80279628 2463 12.99 0.00040 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Now all variables are significant at p < 0.1 drop1(update(lm.full, ~ . -age -ftv.cat -preterm), test = "F")
Single term deletions Model: bwt ~ lwt + race.cat + smoke + ht + ui Df Sum of Sq RSS AIC F value Pr(>F) <none> 75911729 2453 lwt 1 2671613 78583342 2457 6.41 0.01223 * race.cat 2 6674129 82585858 2465 8.00 0.00047 *** smoke 1 4911219 80822948 2463 11.77 0.00074 *** ht 1 3583850 79495579 2459 8.59 0.00381 ** ui 1 6327025 82238754 2466 15.17 0.00014 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Show summary for final model summary(update(lm.full, ~ . -age -ftv.cat -preterm))
Call: lm(formula = bwt ~ lwt + race.cat + smoke + ht + ui, data = lbw) Residuals: Min 1Q Median 3Q Max -1843 -433 67 461 1631 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2837.64 243.63 11.65 < 2e-16 *** lwt 4.24 1.68 2.53 0.01223 * race.catBlack -475.81 145.58 -3.27 0.00129 ** race.catOther -350.00 112.34 -3.12 0.00213 ** smoke -354.90 103.43 -3.43 0.00074 *** ht -585.11 199.61 -2.93 0.00381 ** ui -524.44 134.65 -3.89 0.00014 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 646 on 182 degrees of freedom Multiple R-squared: 0.24, Adjusted R-squared: 0.215 F-statistic: 9.59 on 6 and 182 DF, p-value: 0.00000000366
Forward selection
## ui is the most significant variable add1(lm.null, scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")
Single term additions Model: bwt ~ 1 Df Sum of Sq RSS AIC F value Pr(>F) <none> 99917053 2493 age 1 806927 99110126 2493 1.52 0.2188 lwt 1 3448881 96468171 2488 6.69 0.0105 * race.cat 2 5070608 94846445 2487 4.97 0.0079 ** smoke 1 3573406 96343646 2488 6.94 0.0092 ** preterm 1 4757523 95159530 2485 9.35 0.0026 ** ht 1 2132014 97785038 2491 4.08 0.0449 * ui 1 8028747 91888305 2479 16.34 0.000077 *** ftv.cat 2 2082321 97834732 2493 1.98 0.1410 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Now race.cat is the most significant add1(update(lm.null, ~ . +ui), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")
Single term additions Model: bwt ~ ui Df Sum of Sq RSS AIC F value Pr(>F) <none> 91888305 2479 age 1 472355 91415950 2480 0.96 0.3282 lwt 1 2076990 89811315 2477 4.30 0.0395 * race.cat 2 4767394 87120911 2473 5.06 0.0072 ** smoke 1 2949940 88938365 2475 6.17 0.0139 * preterm 1 2837049 89051257 2475 5.93 0.0159 * ht 1 3162469 88725836 2474 6.63 0.0108 * ftv.cat 2 1847816 90040489 2479 1.90 0.1527 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Now smoke is the most significant add1(update(lm.null, ~ . +ui +race.cat), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")
Single term additions Model: bwt ~ ui + race.cat Df Sum of Sq RSS AIC F value Pr(>F) <none> 87120911 2473 age 1 57041 87063871 2475 0.12 0.72884 lwt 1 2234424 84886488 2470 4.84 0.02900 * smoke 1 6079888 81041024 2461 13.80 0.00027 *** preterm 1 2651610 84469302 2469 5.78 0.01724 * ht 1 2688781 84432130 2469 5.86 0.01646 * ftv.cat 2 1158673 85962238 2474 1.23 0.29373 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Now ht is the most significant add1(update(lm.null, ~ . +ui +race.cat +smoke), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")
Single term additions Model: bwt ~ ui + race.cat + smoke Df Sum of Sq RSS AIC F value Pr(>F) <none> 81041024 2461 age 1 326 81040698 2463 0.00 0.978 lwt 1 1545445 79495579 2459 3.56 0.061 . preterm 1 1338799 79702225 2460 3.07 0.081 . ht 1 2457682 78583342 2457 5.72 0.018 * ftv.cat 2 331205 80709819 2464 0.37 0.689 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Now lwt is the most significant add1(update(lm.null, ~ . +ui +race.cat +smoke +ht), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")
Single term additions Model: bwt ~ ui + race.cat + smoke + ht Df Sum of Sq RSS AIC F value Pr(>F) <none> 78583342 2457 age 1 882 78582460 2459 0.00 0.964 lwt 1 2671613 75911729 2453 6.41 0.012 * preterm 1 1266816 77316526 2456 2.98 0.086 . ftv.cat 2 244671 78338671 2461 0.28 0.754 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Now no variable is significant at p < 0.1 add1(update(lm.null, ~ . +ui +race.cat +smoke +ht +lwt), scope = ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")
Single term additions Model: bwt ~ ui + race.cat + smoke + ht + lwt Df Sum of Sq RSS AIC F value Pr(>F) <none> 75911729 2453 age 1 108807 75802922 2454 0.26 0.61 preterm 1 1008759 74902970 2452 2.44 0.12 ftv.cat 2 325455 75586274 2456 0.39 0.68
## Show summary for final model summary(update(lm.null, ~ . +ui +race.cat +smoke +ht +lwt))
Call: lm(formula = bwt ~ ui + race.cat + smoke + ht + lwt, data = lbw) Residuals: Min 1Q Median 3Q Max -1843 -433 67 461 1631 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2837.64 243.63 11.65 < 2e-16 *** ui -524.44 134.65 -3.89 0.00014 *** race.catBlack -475.81 145.58 -3.27 0.00129 ** race.catOther -350.00 112.34 -3.12 0.00213 ** smoke -354.90 103.43 -3.43 0.00074 *** ht -585.11 199.61 -2.93 0.00381 ** lwt 4.24 1.68 2.53 0.01223 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 646 on 182 degrees of freedom Multiple R-squared: 0.24, Adjusted R-squared: 0.215 F-statistic: 9.59 on 6 and 182 DF, p-value: 0.00000000366
e.g. SWISS data
A data frame with 47 observations on 6 variables, each of which is in percent, i.e., in [0, 100]. [,1] Fertility Ig, ‘common standardized fertility measure’ 임신율 [,2] Agriculture % of males involved in agriculture as occupation [,3] Examination % draftees receiving highest mark on army examination [,4] Education % education beyond primary school for draftees. [,5] Catholic % ‘catholic’ (as opposed to ‘protestant’). [,6] Infant.Mortality live births who live less than 1 year. All variables but ‘Fertility’ give proportions of the population.
head(swiss) fit = lm(Infant.Mortality~., data=swiss) summary(fit)
> head(swiss) Fertility Agriculture Examination Education Catholic Courtelary 80.2 17.0 15 12 9.96 Delemont 83.1 45.1 6 9 84.84 Franches-Mnt 92.5 39.7 5 5 93.40 Moutier 85.8 36.5 12 7 33.77 Neuveville 76.9 43.5 17 15 5.16 Porrentruy 76.1 35.3 9 7 90.57 Infant.Mortality Courtelary 22.2 Delemont 22.2 Franches-Mnt 20.2 Moutier 20.3 Neuveville 20.6 Porrentruy 26.6 > fit = lm(Infant.Mortality~., data=swiss) > summary(fit) Call: lm(formula = Infant.Mortality ~ ., data = swiss) Residuals: Min 1Q Median 3Q Max -8.2512 -1.2860 0.1821 1.6914 6.0937 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.667e+00 5.435e+00 1.595 0.11850 Fertility 1.510e-01 5.351e-02 2.822 0.00734 ** Agriculture -1.175e-02 2.812e-02 -0.418 0.67827 Examination 3.695e-02 9.607e-02 0.385 0.70250 Education 6.099e-02 8.484e-02 0.719 0.47631 Catholic 6.711e-05 1.454e-02 0.005 0.99634 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.683 on 41 degrees of freedom Multiple R-squared: 0.2439, Adjusted R-squared: 0.1517 F-statistic: 2.645 on 5 and 41 DF, p-value: 0.03665 >
step <- stepAIC(fit, direction="both") > step <- stepAIC(fit, direction="both") Start: AIC=98.34 Infant.Mortality ~ Fertility + Agriculture + Examination + Education + Catholic Df Sum of Sq RSS AIC - Catholic 1 0.000 295.07 96.341 - Examination 1 1.065 296.13 96.511 - Agriculture 1 1.256 296.32 96.541 - Education 1 3.719 298.79 96.930 <none> 295.07 98.341 - Fertility 1 57.295 352.36 104.682 Step: AIC=96.34 Infant.Mortality ~ Fertility + Agriculture + Examination + Education Df Sum of Sq RSS AIC - Examination 1 1.320 296.39 94.551 - Agriculture 1 1.395 296.46 94.563 - Education 1 5.774 300.84 95.252 <none> 295.07 96.341 + Catholic 1 0.000 295.07 98.341 - Fertility 1 72.609 367.68 104.681 Step: AIC=94.55 Infant.Mortality ~ Fertility + Agriculture + Education Df Sum of Sq RSS AIC - Agriculture 1 4.250 300.64 93.220 - Education 1 6.875 303.26 93.629 <none> 296.39 94.551 + Examination 1 1.320 295.07 96.341 + Catholic 1 0.255 296.13 96.511 - Fertility 1 79.804 376.19 103.758 Step: AIC=93.22 Infant.Mortality ~ Fertility + Education Df Sum of Sq RSS AIC <none> 300.64 93.220 - Education 1 21.902 322.54 94.525 + Agriculture 1 4.250 296.39 94.551 + Examination 1 4.175 296.46 94.563 + Catholic 1 2.318 298.32 94.857 - Fertility 1 85.769 386.41 103.017 >
> summary(step) Call: lm(formula = Infant.Mortality ~ Fertility + Education, data = swiss) Residuals: Min 1Q Median 3Q Max -7.6927 -1.4049 0.2218 1.7751 6.1685 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.63758 3.33524 2.590 0.012973 * Fertility 0.14615 0.04125 3.543 0.000951 *** Education 0.09595 0.05359 1.790 0.080273 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.614 on 44 degrees of freedom Multiple R-squared: 0.2296, Adjusted R-squared: 0.1946 F-statistic: 6.558 on 2 and 44 DF, p-value: 0.003215 >
How about fertility?
head(swiss) fit = lm(Fertility ~ ., data=swiss) summary(fit)
step <- stepAIC(fit, direction="both") summary(step)
Then figure out the standardized coefficients, betas.