statistical_regression_methods
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
statistical_regression_methods [2020/11/26 08:44] – [Statistical Regression Methods] hkimscil | statistical_regression_methods [2022/11/13 23:01] – hkimscil | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Statistical Regression Methods ====== | + | ~~REDIRECT> |
- | A part of selection method in multiple regression. Inshort, | + | |
- | + | ||
- | Multiple Regression | + | |
- | - Enter method | + | |
- | - Selection method | + | |
- | - Statistical regression method | + | |
- | - forward selection: 인들 (predictors) 중 종속변인인 Y와 상관관계가 가장 높은 변인부터 먼저 투입되어 회귀계산이 수행된다. 먼저 투입된 변인은 (상관관계가 높으므로) 이론적으로 종속변인을 설명하는 중요한 요소로 여겨지게 된다. 또한 다음 변인은 우선 투입된 변인을 고려한 상태로 투입된다. | + | |
- | - backward deletion: 모든 독립변인들이 한꺼번에 투입되어 회귀계산이 시작된다. 이어서 회귀식에 통계학적으로 기여하지 못한다고 판단되는 X변인이 하나씩 제거되면서 회귀계산을 반복적으로 한다. | + | |
- | - stepwise selection: Forward와 같은 방식으로 회귀계산을 하되, 투입된 변인의 설명력을 계산하여 버릴 것인지 취할 것인지를 결정한다. 각 IV에 대한 t-test를 근거로 그 IV가 significant한 기여를 했는지를 판단하는 것을 말한다. | + | |
- | - Sequential regression method | + | |
- | + | ||
- | See also {{youtube>4Y7PF3Ca3Gk}} | + | |
- | See also {{http:// | + | |
- | ---- | + | |
- | The below is from http:// | + | |
- | <WRAP box 70%> | + | |
- | **Forward selection** begins with an empty equation. | + | |
- | + | ||
- | **Backward elimination** (or backward deletion) is the reverse process. | + | |
- | + | ||
- | + | ||
- | **Stepwise regression** is a combination of the forward and backward selection techniques. . . . Stepwise regression is a modification of the forward selection so that __after each step in which a variable was added, all candidate variables in the model are checked to see if their significance has been reduced below the specified tolerance level.__ If a nonsignificant variable is found, it is removed from the model. Stepwise regression requires two significance levels: one for adding variables and one for removing variables. The cutoff probability for adding variables should be less than the cutoff probability for removing variables so that the | + | |
- | procedure does not get into an infinite loop. | + | |
- | + | ||
- | + | ||
- | Sequential Regression Method of Entry: | + | |
- | + | ||
- | **Block-wise selection** is a version of forward selection that is achieved in blocks or sets. The predictors are grouped into blocks based on psychometric consideration or theoretical reasons and __a stepwise selection is applied__. | + | |
- | + | ||
- | Essentially, | + | |
- | + | ||
- | Two criterion are used to achieve the best set of predictors; these include meaningfulness to the situation and statistical significance. | + | |
- | </ | + | |
- | + | ||
- | R에서 | + | |
- | step()에서 AIC 값이 더이상 작아지지 않는 모형을 최종적으로 선택한다. | + | |
- | ====== e.g. 1 ====== | + | |
- | ===== backward elimination ===== | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | names(lbw) <- tolower(names(lbw)) | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | lbw <- within(lbw, { | + | |
- | ## race relabeling | + | |
- | race.cat <- factor(race, | + | |
- | + | ||
- | ## ftv (frequency of visit) relabeling | + | |
- | ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c(" | + | |
- | ftv.cat <- relevel(ftv.cat, | + | |
- | + | ||
- | ## ptl | + | |
- | preterm <- factor(ptl >= 1, levels = c(F,T), labels = c(" | + | |
- | })</ | + | |
- | + | ||
- | < | + | |
- | lm.null <- lm(bwt | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | > step(m.full, | + | |
- | Start: | + | |
- | bwt ~ id + low + age + lwt + race + smoke + ptl + ht + ui + ftv + | + | |
- | preterm + ftv.cat + race.cat | + | |
- | + | ||
- | + | ||
- | Step: AIC=1877.43 | + | |
- | bwt ~ id + low + age + lwt + smoke + ptl + ht + ui + ftv + preterm + | + | |
- | ftv.cat + race.cat | + | |
- | + | ||
- | Df Sum of Sq RSS AIC | + | |
- | - preterm | + | |
- | - ptl | + | |
- | - ftv.cat | + | |
- | - age | + | |
- | - smoke | + | |
- | - race.cat | + | |
- | - ftv | + | |
- | - lwt | + | |
- | < | + | |
- | - ui 1 | + | |
- | - low | + | |
- | - ht 1 129875 | + | |
- | - id 1 29802422 33125778 2310.0 | + | |
- | + | ||
- | Step: AIC=1875.43 | + | |
- | bwt ~ id + low + age + lwt + smoke + ptl + ht + ui + ftv + ftv.cat + | + | |
- | race.cat | + | |
- | + | ||
- | Df Sum of Sq RSS AIC | + | |
- | - ftv.cat | + | |
- | - age | + | |
- | - smoke | + | |
- | - race.cat | + | |
- | - ptl | + | |
- | - ftv | + | |
- | - lwt | + | |
- | < | + | |
- | - ui 1 | + | |
- | - low | + | |
- | + preterm | + | |
- | - ht 1 129854 | + | |
- | - id 1 29828678 33152056 2308.2 | + | |
- | + | ||
- | Step: AIC=1873.49 | + | |
- | bwt ~ id + low + age + lwt + smoke + ptl + ht + ui + ftv + race.cat | + | |
- | + | ||
- | Df Sum of Sq RSS AIC | + | |
- | - age | + | |
- | - ftv | + | |
- | - ptl | + | |
- | - smoke | + | |
- | - race.cat | + | |
- | < | + | |
- | - ui 1 | + | |
- | - low | + | |
- | - lwt | + | |
- | + ftv.cat | + | |
- | + preterm | + | |
- | - ht 1 115086 | + | |
- | - id 1 29807178 33167091 2304.2 | + | |
- | + | ||
- | Step: AIC=1871.53 | + | |
- | bwt ~ id + low + lwt + smoke + ptl + ht + ui + ftv + race.cat | + | |
- | + | ||
- | Df Sum of Sq RSS AIC | + | |
- | - ftv | + | |
- | - ptl | + | |
- | - smoke | + | |
- | - race.cat | + | |
- | < | + | |
- | - ui 1 | + | |
- | - low | + | |
- | - lwt | + | |
- | + age | + | |
- | + ftv.cat | + | |
- | + preterm | + | |
- | - ht 1 114704 | + | |
- | - id 1 30083255 33443781 2303.8 | + | |
- | + | ||
- | Step: AIC=1869.57 | + | |
- | bwt ~ id + low + lwt + smoke + ptl + ht + ui + race.cat | + | |
- | + | ||
- | Df Sum of Sq RSS AIC | + | |
- | - ptl | + | |
- | - smoke | + | |
- | - race.cat | + | |
- | < | + | |
- | - ui 1 | + | |
- | - low | + | |
- | - lwt | + | |
- | + ftv | + | |
- | + age | + | |
- | + preterm | + | |
- | + ftv.cat | + | |
- | - ht 1 118142 | + | |
- | - id 1 30120786 33482147 2302.0 | + | |
- | + | ||
- | Step: AIC=1867.83 | + | |
- | bwt ~ id + low + lwt + smoke + ht + ui + race.cat | + | |
- | + | ||
- | Df Sum of Sq RSS AIC | + | |
- | - smoke | + | |
- | - race.cat | + | |
- | < | + | |
- | - low | + | |
- | - lwt | + | |
- | - ui 1 | + | |
- | + ptl | + | |
- | + preterm | + | |
- | + ftv | + | |
- | + age | + | |
- | + ftv.cat | + | |
- | - ht 1 118424 | + | |
- | - id 1 30313012 33679005 2301.1 | + | |
- | + | ||
- | Step: AIC=1866.36 | + | |
- | bwt ~ id + low + lwt + ht + ui + race.cat | + | |
- | + | ||
- | Df Sum of Sq RSS AIC | + | |
- | - race.cat | + | |
- | < | + | |
- | - low | + | |
- | - lwt | + | |
- | - ui 1 | + | |
- | + smoke | + | |
- | + ptl | + | |
- | + preterm | + | |
- | + ftv | + | |
- | + age | + | |
- | + ftv.cat | + | |
- | - ht 1 118943 | + | |
- | - id 1 31159450 34534890 2303.9 | + | |
- | + | ||
- | Step: AIC=1865.34 | + | |
- | bwt ~ id + low + lwt + ht + ui | + | |
- | + | ||
- | Df Sum of Sq RSS AIC | + | |
- | + race 1 | + | |
- | - lwt | + | |
- | < | + | |
- | - low | + | |
- | + race.cat | + | |
- | - ui 1 | + | |
- | + ptl | + | |
- | + preterm | + | |
- | + ftv | + | |
- | + smoke | + | |
- | + age | + | |
- | + ftv.cat | + | |
- | - ht 1 125091 | + | |
- | - id 1 32067628 35496710 2305.1 | + | |
- | + | ||
- | Step: AIC=1864.7 | + | |
- | bwt ~ id + low + lwt + ht + ui + race | + | |
- | + | ||
- | Df Sum of Sq RSS AIC | + | |
- | < | + | |
- | - lwt | + | |
- | - race 1 | + | |
- | - low | + | |
- | - ui 1 | + | |
- | + smoke | + | |
- | + ptl | + | |
- | + race.cat | + | |
- | + preterm | + | |
- | + ftv | + | |
- | + age | + | |
- | + ftv.cat | + | |
- | - ht 1 119185 | + | |
- | - id 1 31511908 34893274 2303.8 | + | |
- | + | ||
- | Call: | + | |
- | lm(formula = bwt ~ id + low + lwt + ht + ui + race, data = lbw) | + | |
- | + | ||
- | Coefficients: | + | |
- | (Intercept) | + | |
- | 1647.3705 | + | |
- | | + | |
- | | + | |
- | + | ||
- | > | + | |
- | </ | + | |
- | + | ||
- | + | ||
- | + | ||
- | < | + | |
- | < | + | |
- | lm(formula = bwt ~ age + lwt + race.cat + smoke + preterm + ht + | + | |
- | ui + ftv.cat, data = lbw) | + | |
- | + | ||
- | Residuals: | + | |
- | | + | |
- | -1896.38 | + | |
- | + | ||
- | Coefficients: | + | |
- | Estimate Std. Error t value Pr(> | + | |
- | (Intercept) | + | |
- | age | + | |
- | lwt 4.205 1.717 2.448 0.015316 * | + | |
- | race.catBlack -467.043 | + | |
- | race.catOther -323.144 | + | |
- | smoke | + | |
- | preterm1+ | + | |
- | ht -568.111 | + | |
- | ui -494.168 | + | |
- | ftv.catNone | + | |
- | ftv.catMany | + | |
- | --- | + | |
- | Signif. codes: | + | |
- | 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 | + | |
- | + | ||
- | Residual standard error: 646.9 on 178 degrees of freedom | + | |
- | Multiple R-squared: | + | |
- | F-statistic: | + | |
- | + | ||
- | </ | + | |
- | + | ||
- | < | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | Single term deletions | + | |
- | + | ||
- | Model: | + | |
- | bwt ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat | + | |
- | Df Sum of Sq RSS AIC F value Pr(> | + | |
- | < | + | |
- | age | + | |
- | lwt | + | |
- | race.cat | + | |
- | smoke | + | |
- | preterm | + | |
- | ht 1 | + | |
- | ui 1 | + | |
- | ftv.cat | + | |
- | --- | + | |
- | Signif. codes: | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | + | ||
- | Model: | + | |
- | bwt ~ lwt + race.cat + smoke + preterm + ht + ui + ftv.cat | + | |
- | Df Sum of Sq RSS AIC F value Pr(> | + | |
- | < | + | |
- | lwt | + | |
- | race.cat | + | |
- | smoke | + | |
- | preterm | + | |
- | ht 1 | + | |
- | ui 1 | + | |
- | ftv.cat | + | |
- | --- | + | |
- | Signif. codes: | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | drop1(update(lm.full, | + | |
- | + | ||
- | < | + | |
- | + | ||
- | Model: | + | |
- | bwt ~ lwt + race.cat + smoke + preterm + ht + ui | + | |
- | Df Sum of Sq RSS AIC F value Pr(> | + | |
- | < | + | |
- | lwt | + | |
- | race.cat | + | |
- | smoke | + | |
- | preterm | + | |
- | ht 1 | + | |
- | ui 1 | + | |
- | --- | + | |
- | Signif. codes: | + | |
- | + | ||
- | < | + | |
- | drop1(update(lm.full, | + | |
- | + | ||
- | < | + | |
- | + | ||
- | Model: | + | |
- | bwt ~ lwt + race.cat + smoke + ht + ui | + | |
- | Df Sum of Sq RSS AIC F value Pr(> | + | |
- | < | + | |
- | lwt | + | |
- | race.cat | + | |
- | smoke | + | |
- | ht 1 | + | |
- | ui 1 | + | |
- | --- | + | |
- | Signif. codes: | + | |
- | + | ||
- | < | + | |
- | summary(update(lm.full, | + | |
- | </ | + | |
- | < | + | |
- | + | ||
- | Call: | + | |
- | lm(formula = bwt ~ lwt + race.cat + smoke + ht + ui, data = lbw) | + | |
- | + | ||
- | Residuals: | + | |
- | | + | |
- | | + | |
- | + | ||
- | Coefficients: | + | |
- | Estimate Std. Error t value Pr(> | + | |
- | (Intercept) | + | |
- | lwt | + | |
- | race.catBlack | + | |
- | race.catOther | + | |
- | smoke -354.90 | + | |
- | ht | + | |
- | ui | + | |
- | --- | + | |
- | Signif. codes: | + | |
- | + | ||
- | Residual standard error: 646 on 182 degrees of freedom | + | |
- | Multiple R-squared: 0.24, | + | |
- | F-statistic: | + | |
- | </ | + | |
- | + | ||
- | ===== Forward selection ===== | + | |
- | < | + | |
- | add1(lm.null, | + | |
- | </ | + | |
- | < | + | |
- | Single term additions | + | |
- | + | ||
- | Model: | + | |
- | bwt ~ 1 | + | |
- | Df Sum of Sq RSS AIC F value | + | |
- | < | + | |
- | age | + | |
- | lwt | + | |
- | race.cat | + | |
- | smoke | + | |
- | preterm | + | |
- | ht 1 | + | |
- | ui 1 | + | |
- | ftv.cat | + | |
- | --- | + | |
- | Signif. codes: | + | |
- | </ | + | |
- | < | + | |
- | ## Now race.cat is the most significant | + | |
- | add1(update(lm.null, | + | |
- | </ | + | |
- | < | + | |
- | Single term additions | + | |
- | + | ||
- | Model: | + | |
- | bwt ~ ui | + | |
- | Df Sum of Sq RSS AIC F value Pr(> | + | |
- | < | + | |
- | age | + | |
- | lwt | + | |
- | race.cat | + | |
- | smoke | + | |
- | preterm | + | |
- | ht 1 | + | |
- | ftv.cat | + | |
- | --- | + | |
- | Signif. codes: | + | |
- | + | ||
- | </ | + | |
- | < | + | |
- | ## Now smoke is the most significant | + | |
- | add1(update(lm.null, | + | |
- | </ | + | |
- | < | + | |
- | Single term additions | + | |
- | + | ||
- | Model: | + | |
- | bwt ~ ui + race.cat | + | |
- | Df Sum of Sq RSS AIC F value Pr(> | + | |
- | < | + | |
- | age 1 57041 87063871 2475 0.12 0.72884 | + | |
- | lwt 1 | + | |
- | smoke 1 | + | |
- | preterm | + | |
- | ht | + | |
- | ftv.cat | + | |
- | --- | + | |
- | Signif. codes: | + | |
- | </ | + | |
- | < | + | |
- | ## Now ht is the most significant | + | |
- | add1(update(lm.null, | + | |
- | </ | + | |
- | < | + | |
- | Single term additions | + | |
- | + | ||
- | Model: | + | |
- | bwt ~ ui + race.cat + smoke | + | |
- | Df Sum of Sq RSS AIC F value Pr(> | + | |
- | < | + | |
- | age 1 326 81040698 2463 0.00 0.978 | + | |
- | lwt 1 | + | |
- | preterm | + | |
- | ht | + | |
- | ftv.cat | + | |
- | --- | + | |
- | Signif. codes: | + | |
- | </ | + | |
- | < | + | |
- | ## Now lwt is the most significant | + | |
- | add1(update(lm.null, | + | |
- | </ | + | |
- | < | + | |
- | Single term additions | + | |
- | + | ||
- | Model: | + | |
- | bwt ~ ui + race.cat + smoke + ht | + | |
- | Df Sum of Sq RSS AIC F value Pr(> | + | |
- | < | + | |
- | age 1 882 78582460 2459 0.00 0.964 | + | |
- | lwt 1 | + | |
- | preterm | + | |
- | ftv.cat | + | |
- | --- | + | |
- | Signif. codes: | + | |
- | + | ||
- | </ | + | |
- | < | + | |
- | ## Now no variable is significant at p < 0.1 | + | |
- | add1(update(lm.null, | + | |
- | </ | + | |
- | < | + | |
- | Single term additions | + | |
- | + | ||
- | Model: | + | |
- | bwt ~ ui + race.cat + smoke + ht + lwt | + | |
- | Df Sum of Sq RSS AIC F value Pr(>F) | + | |
- | < | + | |
- | age 1 108807 75802922 2454 0.26 | + | |
- | preterm | + | |
- | ftv.cat | + | |
- | </ | + | |
- | < | + | |
- | ## Show summary for final model | + | |
- | summary(update(lm.null, | + | |
- | </ | + | |
- | < | + | |
- | Call: | + | |
- | lm(formula = bwt ~ ui + race.cat + smoke + ht + lwt, data = lbw) | + | |
- | + | ||
- | Residuals: | + | |
- | | + | |
- | | + | |
- | + | ||
- | Coefficients: | + | |
- | Estimate Std. Error t value Pr(> | + | |
- | (Intercept) | + | |
- | ui | + | |
- | race.catBlack | + | |
- | race.catOther | + | |
- | smoke -354.90 | + | |
- | ht | + | |
- | lwt | + | |
- | --- | + | |
- | Signif. codes: | + | |
- | + | ||
- | Residual standard error: 646 on 182 degrees of freedom | + | |
- | Multiple R-squared: 0.24, | + | |
- | F-statistic: | + |
statistical_regression_methods.txt · Last modified: 2022/11/13 23:01 by hkimscil