...
首页> 外文期刊>International Journal of Electrical and Computer Engineering >Improving keyword extraction in multilingual texts
【24h】

Improving keyword extraction in multilingual texts

机译:在多语言文本中提高关键字提取

获取原文
   

获取外文期刊封面封底 >>

       

摘要

The accuracy of keyword extraction is a leading factor in information retrieval systems and marketing. In the real world, text is produced in a variety of languages, and the ability to extract keywords based on information from different languages improves the accuracy of keyword extraction. In this paper, the available information of all languages is applied to improve a traditional keyword extraction algorithm from a multilingual text. The proposed keywork extraction procedure is an unsupervise algorithm and designed based on selecting a word as a keyword of a given text, if in addition to that language holds a high rank based on the keywords criteria in other languages, as well. To achieve to this aim, the average TF-IDF of the candidate words were calculated for the same and the other languages. Then the words with the higher averages TF-IDF were chosen as the extracted keywords. The obtained results indicat that the algorithms’ accuracis of the multilingual texts in term frequency-inverse document frequency (TF-IDF) algorithm, graph-based algorithm, and the improved proposed algorithm are 80%, 60.65%, and 91.3%, respectively.
机译:关键字提取的准确性是信息检索系统和营销中的主要因素。在现实世界中,文本以各种语言制作,以及基于来自不同语言的信息提取关键字的能力提高了关键字提取的准确性。在本文中,应用了所有语言的可用信息来改进来自多语言文本的传统关键字提取算法。建议的基本作业提取过程是一个无人算法,并且基于选择单词作为给定文本的关键字,如果除了基于其他语言的关键字标准,也是如此。为此目的来实现候选词的平均TF-IDF,用于相同和其他语言。然后选择具有较高平均值TF-IDF的单词作为提取的关键字。所获得的结果表明,术语频率反转文档频率(TF-IDF)算法,基于图形算法和改进的提出算法的多语言文本的算法分别为80%,60.65%和91.3%。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号