Statistical Modelling 15 (2) (2015), 134–158

Bayesian-multiplicative treatment of count zeros in compositional data sets

Josep-Antoni Martin-Fernández
Department of Computer Science,
Applied Mathematics and Statistics,
University of Girona,
Girona,
Spain
e-mail: josepantoni.martin@udg.edu

Matthias Templ
Department of Statistics and Probability Theory,
Vienna University of Technology,
Austria


and
Department of Methodology,
Statistics Austria, Austria

and

Department of Geoinformatics,
Faculty of Science,
Palacký University,
Czech Republic


Peter Filzmoser
Department of Statistics and Probability Theory,
Vienna University of Technology,
Austria


and

Department of Geoinformatics,
Faculty of Science,
Palacký University,
Czech Republic


Javier Palarea-Albaladejo
Biomathematics & Statistics Scotland,
UK


Abstract:

Compositional count data are discrete vectors representing the numbers of outcomes falling into any of several mutually exclusive categories. Compositional techniques based on the log-ratio methodology are appropriate in those cases where the total sum of the vector elements is not of interest. Such compositional count data sets can contain zero values which are often the result of insufficiently large samples. That is, they refer to unobserved positive values that may have been observed with a larger number of trials or with a different sampling design. Because the log-ratio transformations require data with positive values, any statistical analysis of count compositions must be preceded by a proper replacement of the zeros. A Bayesian-multiplicative treatment has been proposed for addressing this count zero problem in several case studies. This treatment involves the Dirichlet prior distribution as the conjugate distribution of the multinomial distribution and a multiplicative modification of the non-zero values. Different parameterizations of the prior distribution provide different zero replacement results, whose coherence with the vector space structure of the simplex is stated. Their performance is evaluated from both the theoretical and the computational point of view.

Keywords:

Dirichlet distribution; discrete composition; log-ratio transformations; posterior estimate; zero replacement.
back