...
首页> 外文期刊>IEEE Transactions on Signal Processing >Unsupervised Learning of Parsimonious Mixtures on Large Spaces With Integrated Feature and Component Selection
【24h】

Unsupervised Learning of Parsimonious Mixtures on Large Spaces With Integrated Feature and Component Selection

机译:具有集成特征和成分选择的大空间简约混合的无监督学习

获取原文
获取原文并翻译 | 示例

摘要

Estimating the number of components (the order) in a mixture model is often addressed using criteria such as the Bayesian information criterion (BIC) and minimum message length. However, when the feature space is very large, use of these criteria may grossly underestimate the order. Here, it is suggested that this failure is not mainly attributable to the criterion (e.g., BIC), but rather to the lack of "structure" in standard mixtures--these models trade off data fitness and model complexity only by varying the order. The authors of the present paper propose mixtures with a richer set of tradeoffs. The proposed model allows each component its own informative feature subset, with all other features explained by a common model (shared by all components). Parameter sharing greatly reduces complexity at a given order. Since the space of these parsimonious modeling solutions is vast, this space is searched in an efficient manner, integrating the component and feature selection within the generalized expectation-maximization (GEM) learning for the mixture parameters. The quality of the proposed (unsupervised) solutions is evaluated using both classification error and test set data likelihood. On text data, the proposed multinomial version--learned without labeled examples, without knowing the "true" number of topics, and without feature preprocessing--compares quite favorably with both alternative unsupervised methods and with a supervised naive Bayes classifier. A Gaussian version compares favorably with a recent method introducing "feature saliency" in mixtures.
机译:通常使用诸如贝叶斯信息标准(BIC)和最小消息长度之类的标准来估计混合模型中的组件数量(顺序)。但是,当特征空间很大时,使用这些条件可能会严重低估顺序。在这里,建议这种失败主要不是由于标准(例如BIC)造成的,而是由于标准混合中缺乏“结构”导致的-这些模型仅通过改变顺序来权衡数据适用性和模型复杂性。本文的作者提出了具有较丰富折衷方案的混合方案。所提出的模型允许每个组件拥有自己的信息特征子集,而所有其他特征则由一个通用模型解释(由所有组件共享)。参数共享大大降低了给定顺序的复杂性。由于这些简约建模解决方案的空间很大,因此可以高效地搜索该空间,并将组件和特征选择集成到混合参数的广义期望最大化(GEM)学习中。提出的(无监督)解决方案的质量使用分类误差和测试集数据似然性进行评估。在文本数据上,拟议的多项式版本-在没有标签示例的情况下学习,不知道主题的“真实”数量,并且没有特征预处理-与可选的无监督方法和有监督的朴素贝叶斯分类器相比非常有利。高斯版本与最近在混合物中引入“特征显着性”的方法相比具有优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号