Statistical Modelling 3 (2003), 215–232

Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation

Sophia Rabe-Hesketh,
Department of Biostatistics and Computing,
Institute of Psychiatry, King's College,
DeCrespigny Park,
London SE5 8AF,
UK
eMail: spaksrh@iop.kcl.ac.uk

Andrew Pickles,
School of Epidemiology and Health Sciences and CCSR,
The University of Manchester,
Manchester,
UK

Anders Skrondal,
Division of Epidemiology,
Norwegian Institute of Public Health,
Oslo,
Norway

Abstract:

When covariates are measured with error, inference based on conventional generalized linear models can yield biased estimates of regression parameters. This problem can potentially be rectified by using generalized linear latent and mixed models (GLLAMM), including a measurement model for the relationship between observed an true covariates. However, the models are typically estimated under the assumption that both the true covariates and the measurement errors are normally distributed, although skewed covariate distributions are often observed in practice. In this article we relax the normality assumption for the true covariates by developing nonparametric maximum likelihood estimation (NPMLE) for GLLAMMs. The methodology is applied to estimating the effect of dietary fibre intake on coronary heart disease. We also assess the performance of estimation of regression parameters and empirical Bayes prediction of the true covariate. Normal as well as skewed covariate distributions are simulated and inference is performed based on both maximum likelihood assuming normality and NPMLE. Both etsimators are unbiased and have similar root mean square errors when the true covariate is normal. With a skewed covariate, the conventional estimator is biased but has smaller mean square error than the NPMLE. NPMLE produces substantially improved empirical Bayes predictions of the true covariate when its distribution is skewed.

Keywords:

Empirical Bayes prediction; factor model; generalied linear model; GLLAMM; logistic regression; measurement error; nonparametric maximum likelihood estimation
 

Downloads:

Data and program in zipped archive.


back