====== Wald Test ====== Regression model의 coefficient값이 significant 한지 테스트하는 방법. 즉, regression coefficient의 t-test와 비슷한 일을 한다. H0: Some set of predictor variables are all equal to zero. HA: Not all predictor variables in the set are equal to zero.


#fit regression model
model <- lm(mpg ~ disp + carb + hp + cyl, data = mtcars)

#view model summary
summary(model)

Call:
lm(formula = mpg ~ disp + carb + hp + cyl, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.0761 -1.5752 -0.2051  1.0745  6.3047 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 34.021595   2.523397  13.482 1.65e-13 ***
disp        -0.026906   0.011309  -2.379   0.0247 *  
carb        -0.926863   0.578882  -1.601   0.1210    
hp           0.009349   0.020701   0.452   0.6551    
cyl         -1.048523   0.783910  -1.338   0.1922    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.973 on 27 degrees of freedom
Multiple R-squared:  0.788,	Adjusted R-squared:  0.7566 
F-statistic: 25.09 on 4 and 27 DF,  p-value: 9.354e-09


# coefficient
coef(model)

 (Intercept)         disp         carb           hp          cyl 
34.021594525 -0.026906182 -0.926863291  0.009349208 -1.048522632 

# term1, 2, 3, 4, 5
install.packages("aod")
library(aod) 

# wald.test(Sigma, b, Terms)
#perform Wald Test to determine if 3rd and 4th predictor variables are both zero
wald.test(Sigma = vcov(model), b = coef(model), Terms = 3:4)

Wald test:
----------

Chi-squared test:
X2 = 3.6, df = 2, P(> X2) = 0.16

wald.test(Sigma, b, Terms) * Sigma: The variance-covariance matrix of the regression model * b: A vector of regression coefficients from the model * Terms: A vector that specifies which coefficients to test ====== Wald test in logistic regression ======


odds       <- function(p)      p/(1-p)
odds.ratio <- function(p1, p2) odds(p1)/odds(p2)
logit      <- function(p)      log(p/(1-p))
ilogit     <- function(x)      exp(x)/(1+exp(x))

iter <- 10000
n <- 350
p.cancer <- 0.08
p.mutant <- 0.39

logor <- rep (NA, iter)
pp0 <- rep (NA, iter)
pp1 <- rep (NA, iter)
op0 <- rep (NA, iter)
op1 <- rep (NA, iter)
or <- rep (NA, iter)

for(i in 1:iter){
  c <- runif(n, 0, 1)
  canc <- ifelse(c>=p.cancer, "nocancer", "cancer")
  c <- runif(n, 0, 1)
  gene <- ifelse(c>=p.mutant, "norm", "mutated")
  
  da <- data.frame(gene, canc)
  tab <- table(da)
  pp0[i] <- tab[1,1] / (tab[1,1] + tab[1,2])
  pp1[i] <- tab[2,1] / (tab[2,1] + tab[2,2])
  op0[i] <- odds(pp0[i])
  op1[i] <- odds(pp1[i])
  or[i] <- odds.ratio(pp0[i], pp1[i])  
  # stats <- c(pp0, pp1, op0, op1, ortemp)
  logor[i] <- log(or[i])
}
hist(logor,breaks = 50)