In many fields of study scientists are interested in estimating thenumber of unobserved classes. A biologist may want to find thenumber of rare species of an animal population in order to conserve,manage, and monitor biodiversity; a library manager may want to knowhow many non-circulating items are present in a library system; or aclinical investigator may want to determine the number of unseendisease occurrences. A traditional way of estimating an unknownnumber of classes is by using a negative binomial model (Fisher,Corbet, and Williams 1943). The negative binomial model is based onassuming that the numbers of individuals from each class areindependent Poisson samples, and that the means of these Poissonrandom variables follow a Gamma distribution. This thesis proposesa parametric model where the law of the mean frequency of classes isa finite mixture of exponential distributions. The proposed modelhas the following advantages: model simplicity, efficientcomputation using the EM algorithm, and a straightforwardinterpretation of the fitted model. Also, model assessment by wayof a chi-squared goodness of fit procedure can be used, a benefitthis parametric model has over other commonly used non-parametricmethods.A main accomplishment of this thesis is providing an efficientcomputation of maximum likelihood (ML) estimates for the proposedmodel. Without use of the EM algorithm, finding ML estimates forthis model can be difficult and time consuming. The likelihoodfunction is complicated due to high dimensionality andnon-identifiability of certain parameters. Within the M step of ouralgorithm we embed another EM, which can effortlessly maximize theparameters in the finite mixture. We refer to the algorithm as anested EM. Aitken's acceleration is used to increase speed of thealgorithm.Microbial samples from the coast of Massachusetts Bay near Nahant,Massachusetts are used to demonstrate data analysis using threedifferent numbers of components in the finite mixture of the modeldescribed. It is shown that the model produces reasonable estimatesand fits the data satisfactorily. This model has recently beenpremiered in species richness estimation (Hong et al. 2006),and its many advantages show promise for continued usein estimating the number of unobserved classes.
展开▼