Statistical Modelling 9 (2009), 151–171

Extended truncated Inverse Gaussian–Poisson model

Xavier Puig
Department of Statistics and O.R.,
Technical University of Catalonia
Spain

Josep Ginebra
Dep. d'Estadística, E.T.S.E.I.B.,
Universitat Politècnica de Catalunya,
Avgda. Diagonal 647, 6a Planta
E–08028 Barcelona
Spain
eMail: josep.ginebra@upc.edu

Marta Perez-Casany
Department of Applied Math 2 and Dama-UPC,
Technical University of Catalonia
Spain

Abstract:

The inverse Gaussian–Poisson mixture model is very useful when modelling highly skewed non-negative integer data in fields as diverse as linguistics, ecology, market research, bibliometry, engineering and insurance. When using this statistical model on the frequency of word or species frequency data, one typically truncates its sample space at zero to accommodate for the ignorance about the number of words or species that are not observed. In this paper, we show that by truncating the sample space of the inverse Gaussian–Poisson model, one is allowed to extend its parameter space and in that way improve its fit when the frequency of one is larger and the right tail is heavier than is allowed by the unextended model. By fitting the extended model to word frequency count data, we find many instances where the maximum likelihood estimates fall in the extension of the parameter space.

Keywords:

Distribution of vocabulary; Poisson mixture; Sichel model; species frequency; stilometry; textual data

Downloads:

Data in zipped archive


back