首页> 外国专利> METHOD AND SYSTEM FOR BOOTSTRAPPING STATISTICAL PROCESSING INTO A RULE-BASED NATURAL LANGUAGE PARSER

METHOD AND SYSTEM FOR BOOTSTRAPPING STATISTICAL PROCESSING INTO A RULE-BASED NATURAL LANGUAGE PARSER

机译:将统计处理引导到基于规则的自然语言分析器中的方法和系统

摘要

A method and system for bootstrapping statistical processing into a rule-based natural language parser is provided. In a preferred embodiment, a statistical bootstrapping software facility optimizes the operation of a robust natural language parser that uses a set of lexicon entries to determine possible parts of speech of words from an input string and a set of rules to combine words from the input string into syntactic structures. The facility first operates the parser in a statistics compilation mode, in which, for each of many sample input strings, the parser attempts to apply all applicable rules and lexicon entries. While the parser is operating in the statistics compilation mode, the facility compiles statistics indicating the likelihood of success of each rule and lexicon entry, based on the success of each rule and lexicon entry when applied in the statistics compilation mode. After a sufficient body of likelihood of success statistics have been compiled, the facility operates the parser in an efficient parsing mode, in which the facility uses the compiled statistics to optimize the operation of the parser. In order to parse an input string in the efficient parsing mode, the facility causes the parser to apply applicable rules and lexicon entries in the descending order of the likelihood of their success as indicated by the statistics compiled in the statistics compilation mode.
机译:提供了一种用于将统计处理引导到基于规则的自然语言解析器中的方法和系统。在一个优选实施例中,统计引导软件设施优化了健壮的自然语言解析器的操作,该解析器使用一组词典条目来确定来自输入字符串的单词的语音词性和一组规则来组合来自输入字符串的单词成句法结构。该工具首先在统计信息编译模式下运行解析器,在该模式下,对于许多示例输入字符串中的每一个,解析器都会尝试应用所有适用的规则和词典条目。当解析器在统计信息编译模式下运行时,该工具根据在统计信息编辑模式下应用的每个规则和词典条目的成功,编译指示每个规则和词典条目成功可能性的统计信息。在编译了足够的成功可能性统计数据之后,设施将以高效的解析模式运行解析器,在该模式下,设施将使用已编译的统计信息来优化解析器的操作。为了在有效的解析模式下解析输入字符串,该功能使解析器以成功的可能性的降序应用适用的规则和词典条目,如在统计信息编译模式下编译的统计信息所示。

著录项

  • 公开/公告号WO9600436A1

    专利类型

  • 公开/公告日1996-01-04

    原文格式PDF

  • 申请/专利权人 MICROSOFT CORPORATION;

    申请/专利号WO1995US08245

  • 申请日1995-06-26

  • 分类号G10L5/00;G10L5/02;G10L5/04;G10L7/02;G10L7/08;G10L9/18;

  • 国家 WO

  • 入库时间 2022-08-22 03:49:11

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号