首页> 外文期刊>Computing and informatics >EFFECT OF TERM WEIGHTING ON KEYWORD EXTRACTION IN HIERARCHICAL CATEGORY STRUCTURE
【24h】

EFFECT OF TERM WEIGHTING ON KEYWORD EXTRACTION IN HIERARCHICAL CATEGORY STRUCTURE

机译:术语加权对分层类结构中的关键词提取的影响

获取原文
获取原文并翻译 | 示例
           

摘要

While there have been several studies related to the effect of term weighting on classification accuracy, relatively few works have been conducted on how term weighting affects the quality of keywords extracted for characterizing a document or a category (i.e., document collection). Moreover, many tasks require more complicated category structure, such as hierarchical and network category structure, rather than a flat category structure. This paper presents a qualitative and quantitative study on how term weighting affects keyword extraction in the hierarchical category structure, in comparison to the flat category structure. A hierarchical structure triggers special characteristic in assigning a set of keywords or tags to represent a document or a document collection, with support of statistics in a hierarchy, including category itself, its parent category, its child categories, and sibling categories. An enhancement of term weighting is proposed particularly in the form of a series of modified TFIDF's, for improving keyword extraction. A text collection of public-hearing opinions is used to evaluate variant TFs and IDFs to identify which types of information in hierarchical category structure are useful. By experiments, we found that the most effective IDF family, namely TF-IDFr, is identity sibling child parent in order. The TF-IDFr outperforms the vanilla version of TFIDF with a centroid-based classifier.
机译:虽然已经有几项与术语加权对分类准确性的影响有关的研究,但是在术语加权如何影响提取的关键字的关键字的质量时,已经进行了相对少的作品,用于表征文档或类别(即,文档集合)。此外,许多任务需要更复杂的类别结构,例如分层和网络类别结构,而不是平面类结构。本文介绍了术语加权如何影响分层类结构中的关键字提取的定性和定量研究,与平面结构相比。分层结构触发分配一组关键字或标签的特殊特性以表示文档或文档集合,支持层次结构中的统计信息,包括类别本身,其父类别,其子类别和兄弟类别。术语加权的提高,特别是用于改进关键字提取的一系列改性TFIDF的形式。用于评估变量TFS和IDF的文本集合来识别分层类结构中的哪些类型是有用的。通过实验,我们发现最有效的IDF家族,即TF-IDFR,是身份&兄弟姐妹&孩子&父母顺序。 TF-IDFR与基于质心的分类器的TFIDF的vanilla版本胜过。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号