User Tools

Site Tools


beta_coefficients

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
beta_coefficients [2019/05/21 11:43] hkimscilbeta_coefficients [2020/12/09 18:45] – [e.g.] hkimscil
Line 1: Line 1:
-====== Beta coefficients in regression ====== +====== Beta coefficients in linear regression ====== 
-$$ \beta = b * \frac{sd(x)}{sd(y)} $$+ 
 +{{:pasted:20190521-113150.png?200}} 
 + 
 +\begin{align*} 
 +\large{\beta = b * \frac{sd(x)}{sd(y)}} \ 
 +\end{align*}
  
 <code> <code>
Line 10: Line 15:
 </code> </code>
  
-<code>lm.gpa.clep <- lm(gpa ~ clep, data = tests) +<code>lm.gpa.clepsat <- lm(gpa ~ clep + sat, data = tests)  
-summary(lm.gpa.clep) +summary(lm.gpa.clepsat)
-</code> +
- +
-<code>+
 Call: Call:
-lm(formula = gpa ~ clep, data = tests)+lm(formula = gpa ~ clep + sat, data = tests)
  
 Residuals: Residuals:
       Min        1Q    Median        3Q       Max        Min        1Q    Median        3Q       Max 
--0.190496 -0.141167 -0.002376  0.110847  0.225207 +-0.197888 -0.128974 -0.000528  0.131170  0.226404 
  
 Coefficients: Coefficients:
-            Estimate Std. Error t value Pr(>|t|)     +              Estimate Std. Error t value Pr(>|t|)   
-(Intercept)  1.17438    0.38946   3.015 0.016676   +(Intercept)  1.1607560  0.4081117   2.844   0.0249 
-clep         0.06054    0.01177   5.144 0.000881 ***+clep         0.0729294  0.0253799   2.874   0.0239 * 
 +sat         -0.0007015  0.0012564  -0.558   0.5940  
 --- ---
 Signif. codes:   Signif. codes:  
 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  
-Residual standard error: 0.1637 on degrees of freedom +Residual standard error: 0.1713 on degrees of freedom 
-Multiple R-squared:  0.7679, Adjusted R-squared:  0.7388  +Multiple R-squared:  0.7778, Adjusted R-squared:  0.7143  
-F-statistic: 26.46 on and DF,  p-value: 0.0008808+F-statistic: 12.25 on and DF,  p-value: 0.005175 
 + 
 +
 </code> </code>
  
 +<code>> sd.clep <- sd(clep)
 +> sd.sat <- sd(sat)
 +> sd.gpa <- sd(gpa)
 +> lm.gpa.clepsat <- lm(gpa ~ clep + sat, data = tests) 
 +> summary(lm.gpa.clepsat)
 +
 +Call:
 +lm(formula = gpa ~ clep + sat, data = tests)
 +
 +Residuals:
 +      Min        1Q    Median        3Q       Max 
 +-0.197888 -0.128974 -0.000528  0.131170  0.226404 
 +
 +Coefficients:
 +              Estimate Std. Error t value Pr(>|t|)  
 +(Intercept)  1.1607560  0.4081117   2.844   0.0249 *
 +clep         0.0729294  0.0253799   2.874   0.0239 *
 +sat         -0.0007015  0.0012564  -0.558   0.5940  
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 0.1713 on 7 degrees of freedom
 +Multiple R-squared:  0.7778, Adjusted R-squared:  0.7143 
 +F-statistic: 12.25 on 2 and 7 DF,  p-value: 0.005175
 +
 +> b.clep <- 0.0729294
 +> b.sat <- -0.0007015
 +> beta.clep <- b.clep * (sd.clep/sd.gpa)
 +> beta.sat <- b.sat * (sd.sat/sd.gpa)
 +> lm.beta(lm.gpa.clepsat)
 +
 +Call:
 +lm(formula = gpa ~ clep + sat, data = tests)
 +
 +Standardized Coefficients::
 +(Intercept)        clep         sat 
 +  0.0000000   1.0556486  -0.2051189 
 +
 +> beta.clep
 +[1] 1.055648
 +> beta.sat
 +[1] -0.2051187
 +
 +</code>
 +====== e.g. ======
 +
 +<code>
 +# get marketing data 
 +marketing <- read.csv("http://commres.net/wiki/_media/marketing_from_datarium.csv")
 +head(marketing)
 +# note that I need - X to get rid of X column in the marketing data
 +mod <- lm(sales ~ . - X, data=marketing)
 +summary(mod)
 +</code>
 +
 +<code>
 +> marketing <- read.csv("http://commres.net/wiki/_media/marketing_from_datarium.csv")
 +> head(marketing)
 +  X youtube facebook newspaper sales
 +1 1  276.12    45.36     83.04 26.52
 +2 2   53.40    47.16     54.12 12.48
 +3 3   20.64    55.08     83.16 11.16
 +4 4  181.80    49.56     70.20 22.20
 +5 5  216.96    12.96     70.08 15.48
 +6 6   10.44    58.68     90.00  8.64
 +# note that I need - X to get rid of X column in the marketing data
 +> mod <- lm(sales ~ . - X, data=marketing) 
 +> summary(mod)
 +
 +Call:
 +lm(formula = sales ~ . - X, data = marketing)
 +
 +Residuals:
 +     Min       1Q   Median       3Q      Max 
 +-10.5932  -1.0690   0.2902   1.4272   3.3951 
 +
 +Coefficients:
 +             Estimate Std. Error t value Pr(>|t|)    
 +(Intercept)  3.526667   0.374290   9.422   <2e-16 ***
 +youtube      0.045765   0.001395  32.809   <2e-16 ***
 +facebook     0.188530   0.008611  21.893   <2e-16 ***
 +newspaper   -0.001037   0.005871  -0.177     0.86    
 +---
 +Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 +
 +Residual standard error: 2.023 on 196 degrees of freedom
 +Multiple R-squared:  0.8972, Adjusted R-squared:  0.8956 
 +F-statistic: 570.3 on 3 and 196 DF,  p-value: < 2.2e-16
 +</code>
 +
 +
 +
 +<code>
 +install.packages(lm.beta)
 +library(lm.beta)
 +lm.beta(mod)
 +</code>
 +
 +<code>
 +lm.beta(mod)
 +
 +Call:
 +lm(formula = sales ~ . - X, data = marketing)
 +
 +Standardized Coefficients::
 + (Intercept)      youtube     facebook    newspaper 
 + 0.000000000  0.753065912  0.536481550 -0.004330686 
 +
 +</code>
 +
 +These beta coefficients also can be got from the coefficents from standardized data. 
 +
 +<code>
 +mod.formula <- sales ~ youtube + facebook + newspaper
 +all.vars(mod.formula)
 +marketing.temp <- sapply(marketing[ , all.vars(mod.formula)], scale)
 +head(marketing.temp)
 +mod.scaled <- lm(sales ~ ., data=marketing.scaled)
 +head(marketing.scaled)
 +coefficients(mod.scaled)
 +</code>
 +
 +<code>> mod.formula <- sales ~ youtube + facebook + newspaper
 +> all.vars(mod.formula)
 +[1] "sales"     "youtube"   "facebook"  "newspaper"
 +> marketing.temp <- sapply(marketing[ , all.vars(mod.formula)], scale)
 +> head(marketing.temp)
 +          sales     youtube   facebook newspaper
 +[1,]  1.5481681  0.96742460  0.9790656 1.7744925
 +[2,] -0.6943038 -1.19437904  1.0800974 0.6679027
 +[3,] -0.9051345 -1.51235985  1.5246374 1.7790842
 +[4,]  0.8581768  0.05191939  1.2148065 1.2831850
 +[5,] -0.2151431  0.39319551 -0.8395070 1.2785934
 +[6,] -1.3076295 -1.61136487  1.7267010 2.0408088
 +> mod.scaled <- lm(sales ~ ., data=marketing.scaled)
 +> head(marketing.scaled)
 +       sales     youtube   facebook newspaper
 +1  1.5481681  0.96742460  0.9790656 1.7744925
 +2 -0.6943038 -1.19437904  1.0800974 0.6679027
 +3 -0.9051345 -1.51235985  1.5246374 1.7790842
 +4  0.8581768  0.05191939  1.2148065 1.2831850
 +5 -0.2151431  0.39319551 -0.8395070 1.2785934
 +6 -1.3076295 -1.61136487  1.7267010 2.0408088
 +> coefficients(mod.scaled)
 +  (Intercept)       youtube      facebook     newspaper 
 +-5.034110e-16  7.530659e-01  5.364815e-01 -4.330686e-03 
 +
 +> </code>
 +
 +check out that 
 +''lm.beta(mod) == coefficients(mod.scaled)''
 +
 +and 
 +we can compare the beta coefficients among each other. From the ''lm.beta(mod)'', ''youtube'', ''facebook'', and ''newspaper'' 순으로 설명력을 갖는다고 말할 수 있다. 
  
beta_coefficients.txt · Last modified: 2020/12/09 18:47 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki