首页> 外文会议>International Conference on Electrical Electronics Engineering and Computer Science >Improving Index Term Extraction for Chinese Books with Professional Score
【24h】

Improving Index Term Extraction for Chinese Books with Professional Score

机译:用专业分数提高中文书籍的指数术语提取

获取原文

摘要

The current situation of the index term extraction for Chinese books was investigated. Aiming to improve performance of traditional key phrase extraction methods for extracting index terms, we propose a novel feature named professional score to evaluate the importance of each candidate. Wikipedia is used to identify whether candidates are meaningful keywords in the domain of the book. Then, we quote the idea of PageRank algorithm to calculate the professional score of candidates by fully utilizing the category structure and citing relationships in Wikipedia. To evaluate the performance of our proposed feature in improving the index term extraction for Chinese books, the traditional TF-IDF and the combination method of TF-IDF and our proposed professional score are conducted. It is found that the precision, recall and F-measure obtained by the combining method are respectively higher 54%, 35% and 46% than those obtained by the traditional TF-IDF.
机译:调查了中国书籍指数提取的现状。旨在提高传统关键短语提取方法的绩效提取指标术语,我们提出了一个名为Professional评分的新特征,以评估每个候选人的重要性。维基百科用于识别候选人是否在本书域中是有意义的关键字。然后,我们引用PageRank算法的想法来通过充分利用类别结构并引用维基百科的关系来计算候选人的专业分数。为了评估我们提出的特征在提高中文书籍的指数术语提取方面的表现,进行了传统的TF-IDF和TF-IDF的组合方法和我们提出的专业分数。结果发现,通过组合方法获得的精度,召回和F测量分别比传统TF-IDF所获得的54%,35%和46%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号