首页> 外文期刊>Wuhan University Journal of Natural Sciences >Keyword Extraction Based on tf/idf for Chinese News Document

Keyword Extraction Based on tf/idf for Chinese News Document

机译:基于tf / idf的中文新闻文献关键词提取

获取原文并翻译 | 示例


Keyword extraction is an important research topic ofinformation retrieval. This paper gave the specification of keywords in Chinese news documents based on analyzing linguisticcharacteristics of news documents and then proposed a new keyword extraction method based on tf/idf with multi-strategies. Theapproach selected candidate keywords of uni-, bi- and tri- grams,and then defines the features according to their morphologicalcharacters and context information. Moreover, the paper proposedseveral strategies to amend the incomplete words gotten from theword segmentation and found unknown potential keywords innews documents. Experimental results show that our proposedmethod can significantly outperform the baseline method. We alsoapplied it to retrospective event detection. Experimental resultsshow that the accuracy and efficiency of news retrospective eventdetection can be significantly improved.
机译:关键字提取是信息检索的重要研究课题。在分析新闻文档语言特性的基础上,给出了中文新闻文档中关键词的规范,并提出了一种基于tf / idf多策略关键词提取方法。该方法选择了单词,双词和三词的候选关键词,然后根据其形态特征和上下文信息定义特征。此外,本文提出了几种策略来修正从分词中得到的不完整词,并发现了潜在的未知关键词。实验结果表明,我们提出的方法可以明显优于基线方法。我们还将其应用于追溯事件检测。实验结果表明,新闻回顾事件检测的准确性和效率可以大大提高。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号