首页> 外国专利> OPTIMISATION OF FACT EXTRACTION USING MULTI-STAGE APPROACH

OPTIMISATION OF FACT EXTRACTION USING MULTI-STAGE APPROACH

机译:使用多阶段方法优化事实提取

摘要

FIELD: information technology.;SUBSTANCE: facts are extracted from electronic documents by recognising factual descriptions using a fact-word table to match to words of the electronic documents. The words of those factual descriptions may be tagged with the appropriate part of speech. More detailed analysis is then performed on those factual descriptions, rather than on the entire electronic document, and particularly to the text in the neighbourhood of the fact-word matches. The analysis may involve identifying the linguistic constituents of each phrase and determining the role as either subject or object. Exclusion rules may be applied to eliminate those phrases unlikely to be part of facts, the exclusion rules being based in part on the linguistic constituents. Scoring rules may be applied to remaining phrases, and for those phrases having a score in excess of a threshold, the corresponding sentence part, whole sentence, paragraph, or other document portion may be presented as representing one or more facts.;EFFECT: more accurate search results.;20 cl, 6 dwg
机译:领域:信息技术;实体:通过使用事实词表识别事实描述以匹配电子文档的词来从电子文档中提取事实。这些事实描述的词可以用适当的词性标记。然后,对那些事实描述进行更详细的分析,而不是对整个电子文档,尤其是对事实词匹配项附近的文本进行更详细的分析。该分析可能涉及识别每个短语的语言成分,并将角色确定为主题还是宾语。排除规则可以被应用以消除那些不太可能成为事实的短语,排除规则部分地基于语言成分。评分规则可以应用于剩余短语,对于分数超过阈值的短语,相应的句子部分,整个句子,段落或其他文档部分可以表示为一个或多个事实。准确的搜索结果。; 20 cl,6 dwg

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号