首页> 外国专利> AUTOMATIC METHOD FOR EXTRACTING THE RELEVANT PHRASES FROM TEXTS.

AUTOMATIC METHOD FOR EXTRACTING THE RELEVANT PHRASES FROM TEXTS.

机译:从文本中提取相关短语的自动方法。

摘要

The present invention refers to an automatic method for extracting the relevant phrases from texts without reading them upon analyzing the entropy characteristics of the information, also applying the Pareto statistics. The entropy analysis of the information is useful for separating the hazard (disorder) from the order (relevance), discriminating the words that do not contribute to the meaning from texts, but which are part of the grammatical structure of the file. The Pareto statistic is subsequently applied for obtaining the extreme behaviour of the arrangement of the relevant words in texts, this latter with the purpose of establishing categories of relevance which help the user of the information with the classification of large amounts of files, thereby showing in an organized table the most relevant words or phrases, which provide meaning to the analyzed text. This method is applied to any language and does not require processing the analyzed texts. The inventive method avoids the use of experts in the thematic areas and the definition of lexicon or dictionaries, thus reducing costs and increasing the speed in analysing texts.
机译:本发明涉及一种自动方法,该方法用于从文本中提取相关短语而不在分析信息的熵特征时不读取它们,也应用帕累托统计。信息的熵分析可用于将危险(混乱)与顺序(相关性)分开,从文本中区分出无助于含义的单词,但这些单词是文件语法结构的一部分。随后将Pareto统计信息用于获取文本中相关单词的排列的极端行为,后者的目的是建立相关性类别,以帮助信息用户对大量文件进行分类,从而显示最有组织的表格中最相关的单词或短语,它们为分析的文本提供了含义。此方法适用于任何语言,不需要处理分析的文本。本发明的方法避免了在主题领域和词典或词典的定义方面的专家的使用,从而降低了成本并提高了分析文本的速度。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号