# Additional calculus of M-estimation examples using geex

## Preliminaries

This vignette implements examples from and is meant to read with Stefanski and Boos (2002) (SB). Examples 4-9 use the geexex data set. For information on how the data are generated, run help('geexex').

library(geex)

## The Basic Approach

### Example 4: Instrumental variables

Example 4 calculates an instumental variable estimator. The variables ($$Y_3, W_1, Z_1$$) are observed according to:

$Y_{3i} = \alpha + \beta X_{1i} + \sigma_{\epsilon}\epsilon_{1, i}$

$W_{1i} =\beta X_{1i} + \sigma_{U}\epsilon_{2, i}$

and

$Z_{1i} = \gamma + \delta X_{1i} + \sigma_{\tau}\epsilon_{3, i}.$

$$Y_3$$ is a linear function of a latent variable $$X_1$$, whose coefficient $$\beta$$ is of interest. $$W_1$$ is a measurement of the unobserved $$X_1$$, and $$Z_1$$ is an instrumental variable for $$X_1$$. The instrumental variable estimator is $$\hat{\beta}_{IV}$$ is the ratio of least squares regression slopes of $$Y_3$$ on $$Z_1$$ and $$Y_3$$ and $$W_1$$. The $$\psi$$ function below includes this estimator as $$\hat{\theta}_4 = \hat{\beta}_{IV}$$. In the example data, 100 observations are generated where $$X_1 \sim \Gamma(\text{shape} = 5, \text{rate} = 1)$$, $$\alpha = 2$$, $$\beta = 3$$, $$\gamma = 2$$, $$\delta = 1.5$$, $$\sigma_{\epsilon} = \sigma_{\tau} = 1$$, $$\sigma_U = .25$$, and $$\epsilon_{\cdot, i} \sim N(0,1)$$.

$\psi(Y_{3i}, Z_{1i}, W_{1i}, \mathbf{\theta}) = \begin{pmatrix} \theta_1 - Z_{1i} \\ \theta_2 - W_{1i} \\ (Y_{3i} - \theta_3 W_{1i})(\theta_2 - W_{1i}) \\ (Y_{3i} - \theta_4 W_{1i})(\theta_1 - Z_{1i}) \\ \end{pmatrix}$

SB4_estFUN <- function(data){
Z1 <- data$Z1; W1 <- data$W1; Y3 <- data$Y3 function(theta){ c(theta[1] - Z1, theta[2] - W1, (Y3 - (theta[3] * W1)) * (theta[2] - W1), (Y3 - (theta[4] * W1)) * (theta[1] - Z1) ) } } estimates <- m_estimate( estFUN = SB4_estFUN, data = geexex, root_control = setup_root_control(start = c(1, 1, 1, 1))) The R packages AER and ivpack can compute the IV estimator and sandwich variance estimators respectively, which is done below for comparison. ivfit <- AER::ivreg(Y3 ~ W1 | Z1, data = geexex) iv_se <- ivpack::cluster.robust.se(ivfit, clusterid = 1:nrow(geexex)) coef(ivfit)[2]  ## W1 ## 3.041264 coef(estimates)[4] ## [1] 3.041264 iv_se[2, 'Std. Error']  ## [1] 0.01049669 sqrt(vcov(estimates)[4, 4]) ## [1] 0.01039119 ## Connections to the Influence Curve ### Example 5: Hodges-Lehmann estimator This example shows the link between the influence curves and the Hodges-Lehman estimator. $\psi(Y_{2i}, \theta) = \begin{pmatrix} IC_{\hat{\theta}_{HL}}(Y_2; \theta_0) - (\theta_1 - \theta_0) \\ \end{pmatrix}$ The functions used in SB5_estFUN are defined below: F0 <- function(y, theta0, distrFUN = pnorm){ distrFUN(y - theta0, mean = 0) } f0 <- function(y, densFUN){ densFUN(y, mean = 0) } integrand <- function(y, densFUN = dnorm){ f0(y, densFUN = densFUN)^2 } IC_denom <- integrate(integrand, lower = -Inf, upper = Inf)$value
SB5_estFUN <- function(data){
Yi <- data[['Y2']]
function(theta){

(1/IC_denom) * (F0(Yi, theta[1]) - 0.5)
}
}
estimates <- m_estimate(
estFUN = SB5_estFUN,
data  = geexex,
root_control = setup_root_control(start = 2))

The hc.loc of the ICSNP package computes the Hodges-Lehman estimator and SB gave closed form estimators for the variance.

theta_cls <- ICSNP::hl.loc(geexex$Y2) Sigma_cls <- 1/(12 * IC_denom^2) / nrow(geexex) ##$geex
## $geex$parameters
## [1] 2.026376
##
## $geex$vcov
##            [,1]
## [1,] 0.01040586
##
##
## $cls ##$cls$parameters ## [1] 2.024129 ## ##$cls$vcov ## [1] 0.01047198 ## Non-smooth $$\psi$$ functions ### Example 6: Huber center of symmetry estimator This example illustrates estimation with nonsmooth $$\psi$$ function for computing the Huber (1964) estimator of the center of symmetric distributions. $\psi_k(Y_{1i}, \theta) = \begin{cases} (Y_{1i} - \theta) & \text{when } |(Y_{1i} - \theta)| \leq k \\ k * sgn(Y_{1i} - \theta) & \text{when } |(Y_{1i} - \theta)| > k\\ \end{cases}$ SB6_estFUN <- function(data, k = 1.5){ Y1 <- data$Y1
function(theta){
x <- Y1 - theta[1]
if(abs(x) <= k) x else sign(x) * k
}
}
estimates <- m_estimate(
estFUN = SB6_estFUN,
data  = geexex,
root_control = setup_root_control(start = 3))

The point estimate from geex is compared to the estimate obtained from the huber function in the MASS package. The variance estimate is compared to an estimator suggested by SB: $$A_m = \sum_i [-\psi'(Y_i - \hat{\theta})]/m$$ and $$B_m = \sum_i \psi_k^2(Y_i - \hat{\theta})/m$$, where $$\psi_k'$$ is the derivative of $$\psi$$ where is exists.

theta_cls <- MASS::huber(geexex$Y1, k = 1.5, tol = 1e-10)$mu

psi_k <- function(x, k = 1.5){
if(abs(x) <= k) x else sign(x) * k
}

A <- mean(unlist(lapply(geexex$Y1, function(y){ x <- y - theta_cls -numDeriv::grad(psi_k, x = x) }))) B <- mean(unlist(lapply(geexex$Y1, function(y){
x <- y - theta_cls
psi_k(x = x)^2
})))

## closed form covariance
Sigma_cls <- matrix(1/A * B * 1/A / nrow(geexex))
results <- list(geex = list(parameters = coef(estimates), vcov = vcov(estimates)),
cls = list(parameters = theta_cls, vcov = Sigma_cls))
results
## $geex ##$geex$parameters ## [1] 4.82061 ## ##$geex$vcov ## [,1] ## [1,] 0.08356179 ## ## ##$cls
## $cls$parameters
## [1] 4.999386
##
## $cls$vcov
##           [,1]
## [1,] 0.0928935

### Example 7: Sample quantiles (approximation of $$\psi$$)

Approximation of $$\psi$$ with geex is EXPERIMENTAL.

Example 7 illustrates calculation of sample quantiles using M-estimation and approximation of the $$\psi$$ function.

$\psi(Y_{1i}, \theta) = \begin{pmatrix} 0.5 - I(Y_{1i} \leq \theta_1) \\ \end{pmatrix}$

SB7_estFUN <- function(data){
Y1 <- data$Y1 function(theta){ 0.5 - (Y1 <= theta[1]) } } Note that $$\psi$$ is not differentiable; however, geex includes the ability to approximate the $$\psi$$ function via an approx_control object. The FUN in an approx_control must be a function that takes in the $$\psi$$ function, modifies it, and returns a function of theta. For this example, I approximate $$\psi$$ with a spline function. The eval_theta argument is used to modify the basis of the spline. spline_approx <- function(psi, eval_theta){ y <- Vectorize(psi)(eval_theta) f <- splinefun(x = eval_theta, y = y) function(theta) f(theta) } estimates <- m_estimate( estFUN = SB7_estFUN, data = geexex, root_control = setup_root_control(start = 4.7), approx_control = setup_approx_control(FUN = spline_approx, eval_theta = seq(3, 6, by = .05))) A comparison of the variance is not obvious, so no comparison is made. ##$geex
## $geex$parameters
## [1] 4.7
##
## $geex$vcov
##           [,1]
## [1,] 0.1773569
##
##
## $cls ##$cls$parameters ## [1] 4.708489 ## ##$cls$vcov ## [1] NA ## Regression M-estimators ### Example 8: Robust regression Example 8 demonstrates robust regression for estimating $$\beta$$ from 100 observations generated from $$Y_4 = 0.1 + 0.1 X_{1i} + 0.5 X_{2i} + \epsilon_i$$, where $$\epsilon_i \sim N(0, 1)$$, $$X_1$$ is defined as above, and the first half of the observation have $$X_{2i} = 1$$ and the others have $$X_{2i} = 0$$. $\psi_k(Y_{4i}, \theta) = \begin{pmatrix} \psi_k(Y_{4i} - \mathbf{x}_i^T \beta) \mathbf{x}_i \end{pmatrix}$ psi_k <- function(x, k = 1.345){ if(abs(x) <= k) x else sign(x) * k } SB8_estFUN <- function(data){ Yi <- data$Y4
xi <- model.matrix(Y4 ~ X1 + X2, data = data)
function(theta){
r <- Yi - xi %*% theta
c(psi_k(r) %*% xi)
}
}
estimates <- m_estimate(
estFUN = SB8_estFUN,
data  = geexex,
root_control = setup_root_control(start = c(0, 0, 0)))
m <- MASS::rlm(Y4 ~ X1 + X2, data = geexex, method = 'M')
theta_cls <- coef(m)
Sigma_cls <- vcov(m)
results <- list(geex = list(parameters = coef(estimates), vcov = vcov(estimates)),
cls = list(parameters = theta_cls, vcov = Sigma_cls))
results
## $geex ##$geex$parameters ## [1] -0.04050369 0.14530196 0.30181589 ## ##$geex$vcov ## [,1] [,2] [,3] ## [1,] 0.05871932 -0.0101399730 -0.0133230841 ## [2,] -0.01013997 0.0021854268 0.0003386202 ## [3,] -0.01332308 0.0003386202 0.0447117633 ## ## ##$cls
## $cls$parameters
## (Intercept)          X1          X2
##  -0.0377206   0.1441181   0.2988842
##
## $cls$vcov
##             (Intercept)            X1            X2
## (Intercept)  0.07309914 -0.0103060747 -0.0241724792
## X1          -0.01030607  0.0020579145  0.0005364106
## X2          -0.02417248  0.0005364106  0.0431120686

### Example 9: Generalized linear models

Example 9 illustrates estimation of a generalized linear model.

$\psi(Y_i, \theta) = \begin{pmatrix} D_i(\beta)\frac{Y_i - \mu_i(\beta)}{V_i(\beta) \tau} \end{pmatrix}$

SB9_estFUN <- function(data){
Y <- data$Y5 X <- model.matrix(Y5 ~ X1 + X2, data = data, drop = FALSE) function(theta){ lp <- X %*% theta mu <- plogis(lp) D <- t(X) %*% dlogis(lp) V <- mu * (1 - mu) D %*% solve(V) %*% (Y - mu) } } estimates <- m_estimate( estFUN = SB9_estFUN, data = geexex, root_control = setup_root_control(start = c(.1, .1, .5))) Compare point estimates to glm coefficients and covariance matrix to sandwich. m9 <- glm(Y5 ~ X1 + X2, data = geexex, family = binomial(link = 'logit')) theta_cls <- coef(m9) Sigma_cls <- sandwich::sandwich(m9) results <- list(geex = list(parameters = coef(estimates), vcov = vcov(estimates)), cls = list(parameters = theta_cls, vcov = Sigma_cls)) results ##$geex
## $geex$parameters
## [1] -1.1256071  0.3410619 -0.1148368
##
## $geex$vcov
##             [,1]         [,2]         [,3]
## [1,]  0.35202094 -0.058906883 -0.101528787
## [2,] -0.05890688  0.012842435  0.004357355
## [3,] -0.10152879  0.004357355  0.185455144
##
##
## $cls ##$cls$parameters ## (Intercept) X1 X2 ## -1.1256070 0.3410619 -0.1148368 ## ##$cls$vcov ## (Intercept) X1 X2 ## (Intercept) 0.35201039 -0.058903546 -0.101534539 ## X1 -0.05890355 0.012841392 0.004358926 ## X2 -0.10153454 0.004358926 0.185456314 ## Application to test statistics ### Example 10: Testing equality of success probabilities Example 10 illustrates testing equality of success probablities. $\psi(Y_i, n_i, \theta) = \begin{pmatrix} \frac{(Y_i - n_i \theta_2)^2}{n_i \theta_2( 1 - \theta_2 )} - \theta_1 \\ Y_i - n_i \theta_2 \end{pmatrix}$ SB10_estFUN <- function(data){ Y <- data$ft_made
n <- data$ft_attp function(theta){ p <- theta[2] c(((Y - (n * p))^2)/(n * p * (1 - p)) - theta[1], Y - n * p) } } estimates <- m_estimate( estFUN = SB10_estFUN, data = shaq, units = 'game', root_control = setup_root_control(start = c(.5, .5))) V11 <- function(p) { k <- nrow(shaq) sumn <- sum(shaq$ft_attp)
sumn_inv <- sum(1/shaq$ft_attp) term2_n <- 1 - (6 * p) + (6 * p^2) term2_d <- p * (1 - p) term2 <- term2_n/term2_d term3 <- ((1 - (2 * p))^2) / ((sumn/k) * p * (1 - p)) 2 + (term2 * (1/k) * sumn_inv) - term3 } p_tilde <- sum(shaq$ft_made)/sum(shaq\$ft_attp)
V11_hat <- V11(p_tilde)/23

# Compare variance estimates
V11_hat
## [1] 0.0783097
vcov(estimates)[1, 1]
## [1] 0.1929791
# Note the differences in the p-values
pnorm(35.51/23, mean  = 1, sd = sqrt(V11_hat), lower.tail = FALSE)
## [1] 0.02596785
pnorm(coef(estimates)[1],
mean = 1,
sd = sqrt(vcov(estimates)[1, 1]),
lower.tail = FALSE)
## [1] 0.1078138

This example shows that the empircal sandwich variance estimator may be different from other sandwich variance estimators that make assumptions about the structure of the $$A$$ and $$B$$ matrices.

## References

Huber, Peter J. 1964. Robust Estimation of a Location Parameter 35.

Stefanski, Len A., and Dennis D. Boos. 2002. The Calculus of M-Estimation 56. doi:10.1198/000313002753631330.