Statistical Modelling 4 (2004), 317
A protective estimator for linear regression with nonignorably
missing Gaussian outcomes
Stuart R. Lipsitz
Department of Biometry and Epidemiology
Medical University of South Carolina
USA
eMail: lipsitzs@musc.edu
Geert Molenberghs
Center for Statistics
Limburgs Universitair Centrum
Belgium
Garrett M. Fitzmaurice
Department of Biostatistics
Harvard School of Public Health
Boston, MA 02115
USA
Joseph G. Ibrahim
Department of Biostatistics, Harvard School of Public Health
and Division of Biostatistial Science, Dana-Farber Cancer Institute
Boston, MA 02115
USA
Abstract:
We propose a method for estimating the regression parameters in a
linear regression model for Gaussian data when the outcome variable is
missing for some subjects and missingness is thought to be
nonignorable. Throughout, we assume that missingness is restricted to
the outcome variable and that the covariates are fully observed.
Although maximum likelihood estimation of the regression parameters is
possible once joint models for the outcome variable and the
nonignorable missing data mechanism have been specified, these models
are fundamentally non-identifiable unless unverifiable modeling
assumptions are imposed. In this paper, rather than explicitly
modeling the nonignorable missingness mechanism, we consider the use of
a "protective" estimator of the regression parameters (Brown,
1990). To implement the proposed method, it is necessary to assume
that the outcome variable and one of the covariates have an approximate
bivariate normal distribution, conditional on the remaining
covariates. In addition, it is assumed that the missing data mechanism
is conditionally independent of this covariate, given the outcome
variable and the remaining covariates; the latter is referred to as the
"protective" assumption. A method of moments approach is used to
obtain the protective estimator of the regression parameters; the
jackknife (Quenouille, 1956) is used to estimate the variance. The
method is illustrated using data on the persistence of maternal smoking
from the Six Cities study of the health effects of air pollution (Ware,
et. al., 1984). The results of a simulation study are presented that
examine the magnitude of any finite sample bias.
Keywords:
EM-algorithm; method of moments; non-ignorable
missing data; ordinary least squares.
Downloads:
Data
and SAS code in zipped archive
back