Statistical Modelling 10 (2010), 391–419

Arbitrariness of models for augmented and coarse data, with emphasis on incomplete data and random effects models

Geert Verbeke
Interuniversity Institute for Biostatistics and Statistical Bioinformatics,
Katholieke Universiteit Leuven
and
Universiteit Hasselt
Belgium

Geert Molenberghs
Interuniversity Institute for Biostatistics and Statistical Bioinformatics,
Katholieke Universiteit Leuven,
and
Universiteit Hasselt
Agovalaan 1,
B–Diepenbeek
Belgium
eMail: geert.molenberghs@uhasselt.be

Abstract:

Statistical models often extend beyond the data available. First, in coarse data, what is actually observed is less detailed than what might be, owing to incompleteness, censoring, grouping, or a combination thereof. Second, in augmented data, the observed data are hypothetically supplemented with random effects, latent variables/classes, or component membership in mixture distributions. The two settings together will be referred to as enriched data. Reasons for modelling enriched data encompass mathematical and computational convenience, advantages in interpretation, and substantive plausibility. Models for enriched data combine evidence coming from empirical data with unverifiable model components, resting entirely on assumptions.

This has acute consequences for enriched data, but knowledge about this issue is somewhat scattered. We provide a unified framework for enriched data and show, generally and with focus on incomplete-data models and random-effects models on the other hand, that to any given model an entire class of models can be assigned, with all of its members producing the same fit to the observed data but arbitrary regarding the unobservable parts of the enriched data. The concepts developed are illustrated using a clinical trial in toenail dermatophyte onychomycosis and a developmental toxicity study conducted in mice.

Keywords:

enriched data; exponential random effects; gamma random effects; missing data model; linear mixed model

Downloads:

Example data and code in zipped archive
back