首页> 外国专利> METHOD FOR EXTRACTING INFORMATION FROM UNSTRUCTURED TEXTS WRITTEN IN NATURAL LANGUAGE

METHOD FOR EXTRACTING INFORMATION FROM UNSTRUCTURED TEXTS WRITTEN IN NATURAL LANGUAGE

机译:从自然语言编写的非结构化文本中提取信息的方法

摘要

FIELD: computing.;SUBSTANCE: invention relates to a method for extracting information from unstructured texts written in a natural language. In the method, a set of texts is tokenised into sentences, words and word sequences, rare words are deleted, words are brought to the initial form without typos, according to the words in the initial form, a selected plurality of words of certain parts of speech is selected, used in the description of the target information, the presence of the target information is determined in word sequences containing all words from the selected plurality, the presence of the target information is determined for all text documents containing marked word sequences, the amount of text sources, the word occurrence threshold, and the set of parts of speech are optimised to achieve a set quality of information extraction.;EFFECT: increased quality of information extraction from text data sources.;3 cl, 4 dwg, 1 tbl
机译:场:计算。;物质:发明涉及一种从自然语言编写的非结构化文本中提取信息的方法。在该方法中,将一组文本刻为句子,单词和字序列,删除稀有单词,单词被带到初始形式,没有拼写,根据初始形式的单词,一个特定部分的选定的多个单词选择语音,在目标信息的描述中使用,在包含来自所选多个单词的字序列中确定目标信息的存在,针对包含标记字序列的所有文本文档确定目标信息的存在。优化文本源,词出现阈值和组件集的数量,以实现信息提取的集合质量。;效果:从文本数据源提取的信息提取增加。; 3 Cl,4 dwg,1 TBL.

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号