We study the problem of topic modeling in corpora whose documents areorganized in a multi-level hierarchy. We explore a parametric approach to thisproblem, assuming that the number of topics is known or can be estimated bycross-validation. The models we consider can be viewed as special(finite-dimensional) instances of hierarchical Dirichlet processes (HDPs). Forthese models we show that there exists a simple variational approximation forprobabilistic inference. The approximation relies on a previously unexploitedinequality that handles the conditional dependence between Dirichlet latentvariables in adjacent levels of the model's hierarchy. We compare our approachto existing implementations of nonparametric HDPs. On several benchmarks wefind that our approach is faster than Gibbs sampling and able to learn morepredictive models than existing variational methods. Finally, we demonstratethe large-scale viability of our approach on two newly available corpora fromresearchers in computer security---one with 350,000 documents and over 6,000internal subcategories, the other with a five-level deep hierarchy.
展开▼