首页> 外文期刊>Engineering Applications of Artificial Intelligence >A novel framework for termset selection and weighting in binary text classification
【24h】

A novel framework for termset selection and weighting in binary text classification

机译:用于二进制文本分类的术语集选择和加权的新颖框架

获取原文
获取原文并翻译 | 示例

摘要

This study presents a new framework for termset selection and weighting. The proposed framework is based on employing the joint occurrence statistics of pairs of terms for termset selection and weighting. More specifically, each termset is evaluated by taking into account the simultaneous or individual occurrences of the terms within the termset. Based on the idea that the occurrence of one term but not the other may also convey valuable information for discrimination, the conventionally used term selection schemes are adapted to be employed for termset selection. Similarly, the weight of a selected termset is computed as a function of the terms that occur in the document under concern where a termset is assigned a nonzero weight if either or both of the terms appear in the document This weight estimation scheme allows evaluation of the individual occurrences of the terms and their co-occurrences separately so as to compute the document-specific weight of each termset. The proposed termset-based representation is concatenated with the bag-of-words approach to construct the document vectors. Experiments conducted on three widely used datasets have verified the effectiveness of the proposed framework.
机译:这项研究为术语集选择和加权提供了一个新的框架。所提出的框架基于对词对的联合出现统计,以进行词组选择和加权。更具体地,通过考虑在术语集中术语的同时出现或单独出现来评估每个术语集。基于一个术语而不是另一个的出现也可以传达有价值的信息以进行区分的思想,常规使用的术语选择方案适用于术语集选择。类似地,将根据所关注文档中出现的术语来计算所选术语集的权重,如果一个或两个术语出现在文档中,则该术语集将被分配为非零权重。各个术语的单独出现及其共同出现,以便计算每个术语集的特定于文档的权重。所提出的基于术语集的表示与词袋方法相连接,以构建文档向量。在三个广泛使用的数据集上进行的实验验证了所提出框架的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号