Statistical Modelling 9 (2009), 151171
Extended truncated Inverse Gaussian–Poisson model
Xavier Puig
Department of Statistics and O.R.,
Technical University of Catalonia
Spain
Josep Ginebra
Dep. d'Estadística, E.T.S.E.I.B.,
Universitat Politècnica de Catalunya,
Avgda. Diagonal 647, 6a Planta
E08028 Barcelona
Spain
eMail:
josep.ginebra@upc.edu
Marta Perez-Casany
Department of Applied Math 2 and Dama-UPC,
Technical University of Catalonia
Spain
Abstract:
The inverse Gaussian–Poisson mixture model is very useful when modelling
highly skewed non-negative integer data in fields as diverse as
linguistics, ecology, market research, bibliometry, engineering and
insurance. When using this statistical model on the frequency of word
or species frequency data, one typically truncates its sample space at
zero to accommodate for the ignorance about the number of words or species
that are not observed. In this paper, we show that by truncating the
sample space of the inverse Gaussian–Poisson model, one is allowed to
extend its parameter space and in that way improve its fit when the
frequency of one is larger and the right tail is heavier than is allowed
by the unextended model. By fitting the extended model to word frequency
count data, we find many instances where the maximum likelihood estimates
fall in the extension of the parameter space.
Keywords:
Distribution of vocabulary; Poisson mixture; Sichel model; species frequency;
stilometry; textual data
Downloads:
Data
in zipped archive
back