首页> 外文会议>International Conference on Electrical Electronics Engineering and Computer Science >Improving Index Term Extraction for Chinese Books with Professional Score
【24h】

Improving Index Term Extraction for Chinese Books with Professional Score

机译:提高专业分数中国书籍的指数术语提取

获取原文

摘要

The current situation of the index term extraction for Chinese books was investigated. Aiming to improve performance of traditional key phrase extraction methods for extracting index terms, we propose a novel feature named professional score to evaluate the importance of each candidate. Wikipedia is used to identify whether candidates are meaningful keywords in the domain of the book. Then, we quote the idea of PageRank algorithm to calculate the professional score of candidates by fully utilizing the category structure and citing relationships in Wikipedia. To evaluate the performance of our proposed feature in improving the index term extraction for Chinese books, the traditional TF-IDF and the combination method of TF-IDF and our proposed professional score are conducted. It is found that the precision, recall and F-measure obtained by the combining method are respectively higher 54%, 35% and 46% than those obtained by the traditional TF-IDF.
机译:研究了中国书籍指数术语提取的现状。 旨在提高传统关键短语提取方法的绩效提取指标术语,我们提出了一个名为Professigher Score的新特征,以评估每个候选人的重要性。 维基百科用于识别候选人是否在本书域中是有意义的关键字。 然后,我们引用PageRank算法的想法来通过充分利用类别结构并引用维基百科的关系来计算候选人的专业分数。 为了评估我们提出的拟议特征在提高中文书籍的指数术语提取方面的表现,进行了传统的TF-IDF和TF-IDF的组合方法以及我们提出的专业评分。 结果发现,通过组合方法获得的精度,召回和F措施分别比传统TF-IDF获得的54%,35%和46%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号