首页> 外文期刊>Emerging Topics in Computing, IEEE Transactions on >Label Correlation Mixture Model: A Supervised Generative Approach to Multilabel Spoken Document Categorization
【24h】

Label Correlation Mixture Model: A Supervised Generative Approach to Multilabel Spoken Document Categorization

机译:标签相关混合模型:多标签口语文档分类的有监督生成方法

获取原文
获取原文并翻译 | 示例

摘要

Multilabel categorization, which is more difficult but practical than the conventional binary and multiclass categorization, has received a great deal of attention in recent years. This paper proposes a novel probabilistic generative model, label correlation mixture model (LCMM), to depict the multiply labeled documents, which can be used for multilabel spoken document categorization as well as multilabel text categorization. In LCMM, labels and topics have the one-to-one correspondences. The LCMM consists of two important components: 1) a label correlation model and 2) a multilabel conditioned document model. The label correlation model formulates the generating process of labels where the dependences between the labels are taken into account. We also propose an efficient algorithm for calculating the probability of generating an arbitrary subset of labels. The multilabel conditioned document model can be regarded as a supervised label mixture model, in which labels for a document are known. Each label is characterized by distributions over words. For the parameter learning of the multilabel conditioned document model, in addition to maximum-likelihood estimation, a discriminative approach based on the minimum classification error rate training is proposed. To evaluate LCMM, extensive multilabel categorization experiments are conducted on a spoken document data set and three standard text data sets. The experimental results in comparison with other competitive methods demonstrate the effectiveness of LCMM.
机译:与常规的二进制和多类分类相比,多标签分类更困难但更实用,近年来受到了广泛的关注。本文提出了一种新的概率生成模型,即标签相关混合模型(LCMM),以描述多标签文档,该文档可用于多标签口语文档分类和多标签文本分类。在LCMM中,标签和主题具有一对一的对应关系。 LCMM由两个重要组成部分:1)标签关联模型和2)多标签条件文档模型。标签相关模型制定了标签的生成过程,其中考虑了标签之间的依赖性。我们还提出了一种有效的算法,用于计算生成标签的任意子集的概率。可以将多标签条件文档文档模型视为有监督的标签混合模型,其中已知文档的标签。每个标签的特征在于单词的分布。对于多标签条件文档模型的参数学习,除了最大似然估计之外,还提出了一种基于最小分类错误率训练的判别方法。为了评估LCMM,对语音文档数据集和三个标准文本数据集进行了广泛的多标签分类实验。与其他竞争方法相比,实验结果证明了LCMM的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号