Statistical Modelling 4 (2004), 3–17

A protective estimator for linear regression with nonignorably missing Gaussian outcomes

Stuart R. Lipsitz
Department of Biometry and Epidemiology
Medical University of South Carolina
USA
eMail: lipsitzs@musc.edu

Geert Molenberghs
Center for Statistics
Limburgs Universitair Centrum
Belgium

Garrett M. Fitzmaurice
Department of Biostatistics
Harvard School of Public Health
Boston, MA 02115
USA

Joseph G. Ibrahim
Department of Biostatistics, Harvard School of Public Health
and Division of Biostatistial Science, Dana-Farber Cancer Institute
Boston, MA 02115
USA

Abstract:

We propose a method for estimating the regression parameters in a linear regression model for Gaussian data when the outcome variable is missing for some subjects and missingness is thought to be nonignorable. Throughout, we assume that missingness is restricted to the outcome variable and that the covariates are fully observed. Although maximum likelihood estimation of the regression parameters is possible once joint models for the outcome variable and the nonignorable missing data mechanism have been specified, these models are fundamentally non-identifiable unless unverifiable modeling assumptions are imposed. In this paper, rather than explicitly modeling the nonignorable missingness mechanism, we consider the use of a "protective" estimator of the regression parameters (Brown, 1990). To implement the proposed method, it is necessary to assume that the outcome variable and one of the covariates have an approximate bivariate normal distribution, conditional on the remaining covariates. In addition, it is assumed that the missing data mechanism is conditionally independent of this covariate, given the outcome variable and the remaining covariates; the latter is referred to as the "protective" assumption. A method of moments approach is used to obtain the protective estimator of the regression parameters; the jackknife (Quenouille, 1956) is used to estimate the variance. The method is illustrated using data on the persistence of maternal smoking from the Six Cities study of the health effects of air pollution (Ware, et. al., 1984). The results of a simulation study are presented that examine the magnitude of any finite sample bias.

Keywords:

EM-algorithm; method of moments; non-ignorable missing data; ordinary least squares.
 

Downloads:

Data and SAS code in zipped archive


back