Statistical Modelling 5 (2005), 243–267

Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments

Gilles Celeux
Department of Mathematics
University Paris-Sud
Paris
France

Olivier Martin
INRA
Unité Protéomique
2, Place Viala
34060 Montpellier Cédex 1
France
eMail: martinol@ensam.inra.fr

Christian Lavergne
Institut de Mathématiques et de Modélisation de Montpellier
Montpellier
France

Abstract:

Data variability can be important in microarray data analysis. Thus, when clustering gene expression profiles, it could be judicious to make use of repeated data. In this paper, the problem of analysing repeated data in the model-based cluster analysis context is considered. Linear mixed models are chosen to take into account data variability and mixture of these models are considered. This leads to a large range of possible models depending on the assumptions made on both the covariance structure of the observations and the mixture model. The maximum likelihood estimation of this family of models through the EM algorithm is presented. The problem of selecting a particular mixture of linear mixed models is considered using penalized likelihood criteria. Illustrative Monte Carlo experiments are presented and an application to the clustering of gene expression profiles is detailed. All those experiments highlight the interest of linear mixed model mixtures to take into account data variability in a cluster analysis context.

Keywords:

cluster analysis; gene expression profile; linear model; mixture model; penalized likelihood criteria; random effect
 

Downloads:

Example data and R code in zipped archive

See http://www.r-project.org for information on the R Project for Statistical Computing.


back