User Tools

Site Tools


pre-assumptions_of_regression_analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
pre-assumptions_of_regression_analysis [2016/04/27 08:00] – created hkimscilpre-assumptions_of_regression_analysis [2016/05/11 08:37] (current) – [Outliers] hkimscil
Line 2: Line 2:
 ====== pre-asumptions in regression test ====== ====== pre-asumptions in regression test ======
   * [[Linearity]] - the relationships between the predictors and the outcome variable should be linear   * [[Linearity]] - the relationships between the predictors and the outcome variable should be linear
-  * [Normality]] - the errors should be normally distributed - technically normality is necessary only for the t-tests to be valid, estimation of the coefficients only requires that the errors be identically and independently distributed+  * [[:Normality]] - the errors should be normally distributed - technically normality is necessary only for the t-tests to be valid, estimation of the coefficients only requires that the errors be identically and independently distributed
   * [[:Homoscedasticity|Homogeneity]] of variance (or [[Homoscedasticity]]) - the error variance should be constant   * [[:Homoscedasticity|Homogeneity]] of variance (or [[Homoscedasticity]]) - the error variance should be constant
   * Independence - the errors associated with one observation are not correlated with the errors of any other observation   * Independence - the errors associated with one observation are not correlated with the errors of any other observation
Line 8: Line 8:
  
   * [[Influence]] - individual observations that exert undue influence on the coefficients   * [[Influence]] - individual observations that exert undue influence on the coefficients
-  * [[Collinearity]] or [Singularity] - predictors that are highly collinear, i.e. linearly related, can cause problems in estimating the regression coefficients.+  * [[Collinearity]] or [[Singularity]] - predictors that are highly collinear, i.e. linearly related, can cause problems in estimating the regression coefficients.
  
 ===== Outliers ===== ===== Outliers =====
Line 15: Line 15:
 |  **Model Summary(b) **   ||||||   |  **Model Summary(b) **   ||||||  
 |  Model    R    R Square    Adjusted R Square    Std. Error of the Estimate    Durbin-Watson    |  Model    R    R Square    Adjusted R Square    Std. Error of the Estimate    Durbin-Watson   
-|  1    0.375935755   <bgcolor="yellow"> 0.141327692    0.093623675    277.9593965    1.770202598   +|  1    0.375935755   |@yellow0.141327692    0.093623675    277.9593965    1.770202598   
 | a Predictors: (Constant), income   ||||||  | a Predictors: (Constant), income   |||||| 
 | b Dependent Variable: sales   |||||| | b Dependent Variable: sales   ||||||
Line 28: Line 28:
  
 |  Coefficients(a)   |||||||   |  Coefficients(a)   |||||||  
-|  Model        |  Unstandardized[[br]]Coefficients        Standardized[[br]]Coefficients    t    Sig.   +|  Model        |  Unstandardized \\ Coefficients        Standardized \\ Coefficients    t    Sig.   
 |          B    Std. Error    Beta            |          B    Std. Error    Beta           
 |  1    (Constant)    524.9368996    176.8956007        2.967495504    0.008247696    |  1    (Constant)    524.9368996    176.8956007        2.967495504    0.008247696   
Line 34: Line 34:
 | a Dependent Variable: sales   |||||||   | a Dependent Variable: sales   |||||||  
 <WRAP clear /> <WRAP clear />
-Note, +Note, R<sup>2</sup> = .141 
- R^2= .141 +Further, Anova test shows that the model is not significant, which means that the IV (income) does not seem to be related (or predict) the sales.  
-Further, +Since F test failed, t-test for B also failed.
- Anova test shows that the model is not significant, which means that the IV (income) does not seem to be related (or predict) the sales.  +
-Since +
- F test failed, t-test for B also failed.+
  
 But, the result might be due to some outliers. So, check outliers by examining: But, the result might be due to some outliers. So, check outliers by examining:
   * scatter plot: (z-predicted(x), z-residual(y)). The shape should be rectangular.   * scatter plot: (z-predicted(x), z-residual(y)). The shape should be rectangular.
-  * Mahalanovis score+  * [[Mahalanobis distance]] score
   * Cook distance   * Cook distance
   * Leverage   * Leverage
Line 49: Line 46:
 {{regression04-outlier.jpg?450|scatter plot of zpre and zres}} {{regression04-outlier.jpg?450|scatter plot of zpre and zres}}
  
- +|  Casewise Diagnostics(a)   |||||  
-|         |  Casewise Diagnostics(a)   +
 |  Case Number    Std. Residual    sales    Predicted Value    Residual    |  Case Number    Std. Residual    sales    Predicted Value    Residual   
 |  10    3.425856521    1820    867.7509889    952.2490111    |  10    3.425856521    1820    867.7509889    952.2490111   
Line 57: Line 53:
  
 두 개의 케이스를 제거한 후의 분석: 두 개의 케이스를 제거한 후의 분석:
-r^2값이 14%에서 70% 로 증가하였다.+r<sup>2</sup> 값이 14%에서 70% 로 증가하였다.
 독립변인 income의 b 값이 0.527406291에서 1.618765817로 증가 (따라서, t value도 증가) 하였다.  독립변인 income의 b 값이 0.527406291에서 1.618765817로 증가 (따라서, t value도 증가) 하였다. 
  
-|           |  Model Summary(b)   +|  Model Summary(b)   ||||||   
 |  Model    R    R Square    Adjusted R Square    Std. Error of the Estimate    Durbin-Watson    |  Model    R    R Square    Adjusted R Square    Std. Error of the Estimate    Durbin-Watson   
 |  1    0.836338533    0.699462142    0.680678526    100.2063061    1.559375101    |  1    0.836338533    0.699462142    0.680678526    100.2063061    1.559375101   
pre-assumptions_of_regression_analysis.1461713426.txt.gz · Last modified: 2016/04/27 08:00 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki