Statistical Modelling 5 (2005), 243267
Mixture of linear mixed models for clustering gene
expression profiles from repeated microarray
experiments
Gilles Celeux
Department of Mathematics
University Paris-Sud
Paris
France
Olivier Martin
INRA
Unité Protéomique
2, Place Viala
34060 Montpellier Cédex 1
France
eMail:
martinol@ensam.inra.fr
Christian Lavergne
Institut de Mathématiques et de Modélisation de Montpellier
Montpellier
France
Abstract:
Data variability can be important in microarray data
analysis. Thus, when clustering gene
expression profiles, it could be judicious to make
use of repeated data. In this paper, the problem of
analysing repeated data in the model-based cluster
analysis context is considered. Linear mixed models are
chosen to take into account data variability and
mixture of these models are considered. This leads to a
large range of possible models depending on the assumptions
made on both the covariance structure of the
observations and the mixture model. The maximum likelihood
estimation of this family of models through
the EM algorithm is presented. The problem of selecting a
particular mixture of linear mixed models is
considered using penalized likelihood criteria. Illustrative
Monte Carlo experiments are presented and an
application to the clustering of gene expression profiles
is detailed. All those experiments highlight the
interest of linear mixed model mixtures to take into account
data variability in a cluster analysis context.
Keywords:
cluster analysis; gene expression profile;
linear model; mixture model; penalized likelihood
criteria; random effect
Downloads:
Example
data and R code in zipped archive
See http://www.r-project.org
for information on the R Project for Statistical Computing.
back