Statistical Modelling 8 (2008), 81–96

Multivariate mixtures of Polya trees for modeling ROC data

Timothy E Hanson
Division of Biostatistics,
University of Minnesota School of Public Health,
A460 Mayo Building MMC 303,
420 Delaware Street S.E.,
Minneapolis, MN 55455
USA
eMail: hanson@biostat.umn.edu

Adam J Branscum
Departments of Biostatistics, Statistics, and Epidemiology,
University of Kentucky,
USA

Ian A Gardner
Department of Medicine and Epidemiology,
University of California at Davis
USA

Abstract:

Receiver operating characteristic (ROC) curves provide a graphical measure of diagnostic test accuracy. Because ROC curves are determined using the distributions of diagnostic test outcomes for noninfected and infected populations, there is an increasing trend to develop flexible models for these component distributions. We present methodology for joint nonparametric estimation of several ROC curves from multivariate serologic data. We develop an empirical Bayes approach that allows for arbitrary noninfected and infected component distributions that are modelled using Bayesian multivariate mixtures of finite Polya trees priors. Robust, data-driven inferences forROCcurves and the area under the curve are obtained, and a straightforward method for testing a Dirichlet process versus a more general Polya tree model is presented. Computational challenges can arise when using Polya trees to model large multivariate data sets that exhibit clustering. We discuss and implement practical procedures for addressing these obstacles, which are applied to bivariate data used to evaluate the performances of two ELISA tests for detection of Johne's disease.

Keywords:

Bayesian nonparametrics; diagnostic test evaluation; empirical Bayes
back