首页> 外文期刊>Pattern recognition letters >Finding significant keywords for document databases by two-phase Maximum Entropy Partitioning
【24h】

Finding significant keywords for document databases by two-phase Maximum Entropy Partitioning

机译:通过两相最大熵分区查找文档数据库的重要关键字

获取原文
获取原文并翻译 | 示例
           

摘要

This paper investigates the selection of class-specific significant keywords for document databases. We define two types of significant keywords with respect to a document class: Elite and Unique Elite, derived in two phases. Elite Keywords are defined as those that have high term frequencies within the class. To obtain the top partition of distinctively high occurring terms in each class, we employ Maximum Entropy Partitioning (MEP) in the first phase. Our presumption is that the term probabilities within the subset of significant (and non-significant) keywords at the point of maximum entropy are relatively more uniform with respect to each other. Unique Elite keywords are those that are Elite for a particular class, and at the same time have a higher frequency of occurrence only in that class as compared to the other classes. To measure this aspect, in the second phase, we compute the entropy of each Elite keyword across all classes, sort the entropies in the ascending order and again employ MEP to shortlist those Elite keywords that occur uniquely in this class, characterized by distinctively low entropy. Experimental comparisons with the state-of-the-art on benchmark datasets using an ensemble of bagged tree classifiers, establishes the discriminatory powers of the derived keywords. (C) 2019 Elsevier B.V. All rights reserved.
机译:本文调查了文档数据库的类别特定的重要关键字的选择。我们为文档类定义了两种类型的重要关键字:精英和独特的精英,派生在两个阶段。 Elite关键字被定义为类内具有高符号频率的关键字。为了在每个类中获取明显高发生的术语的顶级分区,我们在第一阶段采用了最大的熵分区(MEP)。我们的推测是,在最大熵点处的重要(和非显着)关键词的子集中的概率相对彼此相对较为均匀。唯一的Elite关键字是特定类别是精英的,同时只有与其他类相比的课程中的频率较高。要测量此方面,在第二阶段,我们将每个Elite关键字的熵计算在所有类上,按升序对熵进行排序,并再次雇用MEP以在此类中唯一地发生唯一的Elite关键字,其特征在于熵优于熵。使用袋装树分类器的集合来实现与基准数据集的实验比较,建立派生关键词的歧视权。 (c)2019 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号