pre-assumptions_of_regression_analysis
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
pre-assumptions_of_regression_analysis [2016/04/27 08:00] – created hkimscil | pre-assumptions_of_regression_analysis [2016/05/11 08:37] (current) – [Outliers] hkimscil | ||
---|---|---|---|
Line 2: | Line 2: | ||
====== pre-asumptions in regression test ====== | ====== pre-asumptions in regression test ====== | ||
* [[Linearity]] - the relationships between the predictors and the outcome variable should be linear | * [[Linearity]] - the relationships between the predictors and the outcome variable should be linear | ||
- | * [Normality]] - the errors should be normally distributed - technically normality is necessary only for the t-tests to be valid, estimation of the coefficients only requires that the errors be identically and independently distributed | + | * [[:Normality]] - the errors should be normally distributed - technically normality is necessary only for the t-tests to be valid, estimation of the coefficients only requires that the errors be identically and independently distributed |
* [[: | * [[: | ||
* Independence - the errors associated with one observation are not correlated with the errors of any other observation | * Independence - the errors associated with one observation are not correlated with the errors of any other observation | ||
Line 8: | Line 8: | ||
* [[Influence]] - individual observations that exert undue influence on the coefficients | * [[Influence]] - individual observations that exert undue influence on the coefficients | ||
- | * [[Collinearity]] or [Singularity] - predictors that are highly collinear, i.e. linearly related, can cause problems in estimating the regression coefficients. | + | * [[Collinearity]] or [[Singularity]] - predictors that are highly collinear, i.e. linearly related, can cause problems in estimating the regression coefficients. |
===== Outliers ===== | ===== Outliers ===== | ||
Line 15: | Line 15: | ||
| **Model Summary(b) ** | | **Model Summary(b) ** | ||
| Model | | Model | ||
- | | 1 | + | | 1 |
| a Predictors: (Constant), income | | a Predictors: (Constant), income | ||
| b Dependent Variable: sales | | b Dependent Variable: sales | ||
Line 28: | Line 28: | ||
| Coefficients(a) | | Coefficients(a) | ||
- | | Model | + | | Model |
| | | | ||
| 1 | | 1 | ||
Line 34: | Line 34: | ||
| a Dependent Variable: sales | | a Dependent Variable: sales | ||
<WRAP clear /> | <WRAP clear /> | ||
- | Note, | + | Note, R<sup>2</ |
- | R^2^ = .141 | + | Further, Anova test shows that the model is not significant, |
- | Further, | + | Since F test failed, t-test for B also failed. |
- | Anova test shows that the model is not significant, | + | |
- | Since | + | |
- | F test failed, t-test for B also failed. | + | |
But, the result might be due to some outliers. So, check outliers by examining: | But, the result might be due to some outliers. So, check outliers by examining: | ||
* scatter plot: (z-predicted(x), | * scatter plot: (z-predicted(x), | ||
- | * Mahalanovis | + | * [[Mahalanobis distance]] |
* Cook distance | * Cook distance | ||
* Leverage | * Leverage | ||
Line 49: | Line 46: | ||
{{regression04-outlier.jpg? | {{regression04-outlier.jpg? | ||
- | + | | Casewise Diagnostics(a) | |
- | | | + | |
| Case Number | | Case Number | ||
| 10 | | 10 | ||
Line 57: | Line 53: | ||
두 개의 케이스를 제거한 후의 분석: | 두 개의 케이스를 제거한 후의 분석: | ||
- | r^2^ 값이 14%에서 70% 로 증가하였다. | + | r<sup>2</ |
독립변인 income의 b 값이 0.527406291에서 1.618765817로 증가 (따라서, t value도 증가) 하였다. | 독립변인 income의 b 값이 0.527406291에서 1.618765817로 증가 (따라서, t value도 증가) 하였다. | ||
- | | | + | | Model Summary(b) |
| Model | | Model | ||
| 1 | | 1 |
pre-assumptions_of_regression_analysis.1461713426.txt.gz · Last modified: 2016/04/27 08:00 by hkimscil