首页> 外文期刊>Pattern recognition letters >Finding significant keywords for document databases by two-phase Maximum Entropy Partitioning
【24h】

Finding significant keywords for document databases by two-phase Maximum Entropy Partitioning

机译:通过两阶段的最大熵分区查找文档数据库的重要关键字

获取原文
获取原文并翻译 | 示例

摘要

This paper investigates the selection of class-specific significant keywords for document databases. We define two types of significant keywords with respect to a document class: Elite and Unique Elite, derived in two phases. Elite Keywords are defined as those that have high term frequencies within the class. To obtain the top partition of distinctively high occurring terms in each class, we employ Maximum Entropy Partitioning (MEP) in the first phase. Our presumption is that the term probabilities within the subset of significant (and non-significant) keywords at the point of maximum entropy are relatively more uniform with respect to each other. Unique Elite keywords are those that are Elite for a particular class, and at the same time have a higher frequency of occurrence only in that class as compared to the other classes. To measure this aspect, in the second phase, we compute the entropy of each Elite keyword across all classes, sort the entropies in the ascending order and again employ MEP to shortlist those Elite keywords that occur uniquely in this class, characterized by distinctively low entropy. Experimental comparisons with the state-of-the-art on benchmark datasets using an ensemble of bagged tree classifiers, establishes the discriminatory powers of the derived keywords. (C) 2019 Elsevier B.V. All rights reserved.
机译:本文研究了文档数据库中特定于类别的重要关键字的选择。对于文档类,我们定义两种类型的重要关键字:Elite和Unique Elite,它分两个阶段派生。精英关键字定义为班级中具有高频率的关键字。为了获得每个类别中出现率很高的术语的顶部分区,我们在第一阶段采用了最大熵分区(MEP)。我们的假设是,在最大熵点处,重要(和非重要)关键字子集中的术语“概率”相对而言更为统一。唯一的Elite关键字是属于特定类别的Elite关键字,并且与其他类别相比,仅在该类别中出现频率更高。为了衡量这一方面,在第二阶段,我们计算所有类别中每个Elite关键字的熵,按升序对熵进行排序,然后再次使用MEP筛选出在此类中唯一出现的,具有明显低熵特征的Elite关键字。使用袋装树分类器对基准数据集进行最新技术的实验比较,确定了派生关键字的区分能力。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号