Differences

This shows you the differences between two versions of the page.

--- statistical_regression_methods [2017/11/13 10:07] – [e.g. 1] hkimscil
+++ statistical_regression_methods [2022/11/13 23:01] (current) – hkimscil
@@ Line 1: / Line 1: @@
-====== Statistical Regression Methods ======
+~~REDIRECT>statistical regression~~
-A part of selection method in multiple regression. Inshort,
-Multiple Regression
-  - Enter method
-  - Selection method
-    - Statistical regression method
-      - forward selection: 인들 (predictors) 중 종속변인인 Y와 상관관계가 가장 높은 변인부터 먼저 투입되어 회귀계산이 수행된다. 먼저 투입된 변인은 (상관관계가 높으므로) 이론적으로 종속변인을 설명하는 중요한 요소로 여겨지게 된다. 또한 다음 변인은 우선 투입된 변인을 고려한 상태로 투입된다.
-      - backward deletion: 모든 독립변인들이 한꺼번에 투입되어 회귀계산이 시작된다. 이어서 회귀식에 통계학적으로 기여하지 못한다고 판단되는 X변인이 하나씩 제거되면서 회귀계산을 반복적으로 한다.
-      - stepwise selection: Forward와 같은 방식으로 회귀계산을 하되, 투입된 변인의 설명력을 계산하여 버릴 것인지 취할 것인지를 결정한다. 각 IV에 대한 t-test를 근거로 그 IV가 significant한 기여를 했는지를 판단하는 것을 말한다.
-    - Sequential regression method
-See also {{youtube>4Y7PF3Ca3Gk}}
-See also {{http://ncss.wpengine.netdna-cdn.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Stepwise_Regression.pdf|Stepwise regression in NCSS site}}
-----
-The below is from http://www.statisticssolutions.com/selection-process-for-multiple-regression/
-<WRAP box 70%>
-**Forward selection** begins with an empty equation.  Predictors are added one at a time beginning with the predictor with the highest correlation with the dependent variable.  Variables of greater theoretical importance are entered first.  Once in the equation, the variable remains there.
-**Backward elimination** (or backward deletion) is the reverse process.  All the independent variables are entered into the equation first and each one is deleted one at a time if they do not contribute to the regression equation.
-**Stepwise regression** is a combination of the forward and backward selection techniques. . . . Stepwise regression is a modification of the forward selection so that __after each step in which a variable was added, all candidate variables in the model are checked to see if their significance has been reduced below the specified tolerance level.__ If a nonsignificant variable is found, it is removed from the model. Stepwise regression requires two significance levels: one for adding variables and one for removing variables. The cutoff probability for adding variables should be less than the cutoff probability for removing variables so that the
-procedure does not get into an infinite loop.
-Sequential Regression Method of Entry:
-**Block-wise selection** is a version of forward selection that is achieved in blocks or sets.  The predictors are grouped into blocks based on psychometric consideration or theoretical reasons and __a stepwise selection is applied__.  Each block is applied separately while the other predictor variables are ignored.  Variables can be removed when they do not contribute to the prediction.  In general, the predictors included in the blocks will be inter-correlated.  Also, the order of entry has an impact on which variables will be selected; those that are entered in the earlier stages have a better chance of being retained than those entered at later stages.
-Essentially, the multiple regression selection process enables the researcher to obtain a reduced set of variables from a larger set of predictors, eliminating unnecessary predictors, simplifying data, and enhancing predictive accuracy.
-Two criterion are used to achieve the best set of predictors; these include meaningfulness to the situation and statistical significance.  By entering variables into the equation in a given order, confounding variables can be investigated and variables that are highly correlated can be combined into blocks.
-</WRAP>
-====== e.g. 1 ======
-{{:r:lowbwt.csv}} read [[:r:lowbwt dataset]] or see [[https://notendur.hi.is/birgirhr/lowbwt.txt]]
-<code>lbw <- read.csv("http://commres.net/wiki/_media/r/lowbwt.csv", sep=",")
-names(lbw) <- tolower(names(lbw))
-</code>
-<code>## Recoding
-lbw <- within(lbw, {
-    ## race relabeling
-    race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))
-    ## ftv (frequency of visit) relabeling
-    ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
-    ftv.cat <- relevel(ftv.cat, ref = "Normal")
-    ## ptl
-    preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))
-})</code>
-<code>lm.full <- lm(bwt ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat, data = lbw)
-lm.null <- lm(bwt ~ 1, data = lbw)
-</code>
-<code>summary(lm.full)
-> summary(lm.full)
-Call:
-lm(formula = bwt ~ age + lwt + race.cat + smoke + preterm + ht +
-    ui + ftv.cat, data = lbw)
-Residuals:
-     Min       1Q   Median       3Q      Max
--1896.38  -445.54    53.58   466.07  1654.74
-Coefficients:
-              Estimate Std. Error t value Pr(>|t|)
-(Intercept)   2949.808    320.517   9.203  < 2e-16 ***
-age             -2.928      9.674  -0.303 0.762483
-lwt              4.205      1.717   2.448 0.015316 *
-race.catBlack -467.043    149.797  -3.118 0.002125 **
-race.catOther -323.144    117.411  -2.752 0.006532 **
-smoke         -307.880    109.148  -2.821 0.005335 **
-preterm1+     -207.757    136.364  -1.524 0.129394
-ht            -568.111    200.905  -2.828 0.005225 **
-ui            -494.168    137.246  -3.601 0.000412 ***
-ftv.catNone    -55.975    105.373  -0.531 0.595934
-ftv.catMany   -185.275    203.215  -0.912 0.363151
----
-Signif. codes:
-‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-Residual standard error: 646.9 on 178 degrees of freedom
-Multiple R-squared:  0.2544,	Adjusted R-squared:  0.2125
-F-statistic: 6.074 on 10 and 178 DF,  p-value: 6.27e-08
-</code>
-<code>drop1(lm.full, test = "F")
-Single term deletions
-Model:
-bwt ~ age + lwt + race.cat + smoke + preterm + ht + ui + ftv.cat
-         Df Sum of Sq      RSS    AIC F value    Pr(>F)
-<none>                74494960 2457.2
-age       1     38343 74533303 2455.3  0.0916 0.7624834
-lwt       1   2508944 77003904 2461.4  5.9949 0.0153165
-race.cat  2   5560980 80055939 2466.8  6.6438 0.0016492
-smoke     1   3329939 77824899 2463.4  7.9566 0.0053352
-preterm   1    971457 75466416 2457.6  2.3212 0.1293944
-ht        1   3346518 77841478 2463.5  7.9962 0.0052247
-ui        1   5425727 79920686 2468.5 12.9644 0.0004115
-ftv.cat   2    380072 74875032 2454.1  0.4541 0.6357678
-<none>
-age
-lwt      *
-race.cat **
-smoke    **
-preterm
-ht       **
-ui       ***
-ftv.cat
----
-Signif. codes:
-‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
->
->
-</code>