====== Wald Test ======
Regression model의 coefficient값이 significant 한지 테스트하는 방법. 즉, regression coefficient의 t-test와 비슷한 일을 한다.
H0: Some set of predictor variables are all equal to zero.
HA: Not all predictor variables in the set are equal to zero.
#fit regression model
model <- lm(mpg ~ disp + carb + hp + cyl, data = mtcars)
#view model summary
summary(model)
Call:
lm(formula = mpg ~ disp + carb + hp + cyl, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-5.0761 -1.5752 -0.2051 1.0745 6.3047
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.021595 2.523397 13.482 1.65e-13 ***
disp -0.026906 0.011309 -2.379 0.0247 *
carb -0.926863 0.578882 -1.601 0.1210
hp 0.009349 0.020701 0.452 0.6551
cyl -1.048523 0.783910 -1.338 0.1922
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.973 on 27 degrees of freedom
Multiple R-squared: 0.788, Adjusted R-squared: 0.7566
F-statistic: 25.09 on 4 and 27 DF, p-value: 9.354e-09
# coefficient
coef(model)
(Intercept) disp carb hp cyl
34.021594525 -0.026906182 -0.926863291 0.009349208 -1.048522632
# term1, 2, 3, 4, 5
install.packages("aod")
library(aod)
# wald.test(Sigma, b, Terms)
#perform Wald Test to determine if 3rd and 4th predictor variables are both zero
wald.test(Sigma = vcov(model), b = coef(model), Terms = 3:4)
Wald test:
----------
Chi-squared test:
X2 = 3.6, df = 2, P(> X2) = 0.16
wald.test(Sigma, b, Terms)
* Sigma: The variance-covariance matrix of the regression model
* b: A vector of regression coefficients from the model
* Terms: A vector that specifies which coefficients to test
====== Wald test in logistic regression ======
odds <- function(p) p/(1-p)
odds.ratio <- function(p1, p2) odds(p1)/odds(p2)
logit <- function(p) log(p/(1-p))
ilogit <- function(x) exp(x)/(1+exp(x))
iter <- 10000
n <- 350
p.cancer <- 0.08
p.mutant <- 0.39
logor <- rep (NA, iter)
pp0 <- rep (NA, iter)
pp1 <- rep (NA, iter)
op0 <- rep (NA, iter)
op1 <- rep (NA, iter)
or <- rep (NA, iter)
for(i in 1:iter){
c <- runif(n, 0, 1)
canc <- ifelse(c>=p.cancer, "nocancer", "cancer")
c <- runif(n, 0, 1)
gene <- ifelse(c>=p.mutant, "norm", "mutated")
da <- data.frame(gene, canc)
tab <- table(da)
pp0[i] <- tab[1,1] / (tab[1,1] + tab[1,2])
pp1[i] <- tab[2,1] / (tab[2,1] + tab[2,2])
op0[i] <- odds(pp0[i])
op1[i] <- odds(pp1[i])
or[i] <- odds.ratio(pp0[i], pp1[i])
# stats <- c(pp0, pp1, op0, op1, ortemp)
logor[i] <- log(or[i])
}
hist(logor,breaks = 50)