首页> 外文会议>IEEE Canadian Conference on Electrical and Computer Engineering >Keyword and Keyphrase Extraction using Newton's Law of Universal Gravitation
【24h】

Keyword and Keyphrase Extraction using Newton's Law of Universal Gravitation

机译:利用牛顿万有引力定律提取关键词和关键词

获取原文

摘要

In current times, there has been a surge in the amount of collected data from computational systems. The vast amount of data can be useful in many applications and fields, particularly so in Big Data Analytics. However with a large collection of data there is a difficulty discovering important information. Automatic Document Summarization (ADS) systems are suitable for the task of outlining useful data. The ADS system model takes a text document as input, and outputs a semantically-relevant summary of this information. This information can be further separated and outlined as keywords, or keyphrases. This paper proposes a novel unsupervised approach for automatic keyword and keyphrase generation system using Newton's Law of Universal Gravitation. This approach allows for a complete capture of meaningful text, incorporating the physical structure of a document and discovered relationships between highly related words. Our model uses a new weighting method that combines both the character length of a word, and frequency of a word within a document to simulate a mass. Our model then computes the force of attraction and ranks the word-pair-force as a means of keyword and keyphrase extraction. Experimental results on several text documents demonstrated that the proposed approach improves on the state-of-the-art models.
机译:当前,从计算系统收集的数据量激增。大量数据在许多应用程序和领域中都非常有用,在大数据分析中尤其如此。但是,由于收集了大量的数据,因此很难发现重要的信息。自动文档摘要(ADS)系统适用于概述有用数据的任务。 ADS系统模型将文本文档作为输入,并输出此信息的语义相关摘要。此信息可以进一步分离,并概括为关键字或关键词。本文提出了一种基于牛顿万有引力定律的自动关键字和关键词短语自动生成系统的无监督方法。这种方法可以完整捕获有意义的文本,并结合文档的物理结构以及在高度相关的单词之间发现的关系。我们的模型使用一种新的加权方法,该方法结合了单词的字符长度和文档中单词的出现频率来模拟质量。然后,我们的模型计算吸引力,并将单词对力排序为关键字和关键词提取的一种方式。在多个文本文档上的实验结果表明,所提出的方法对最新模型进行了改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号