首页> 外文学位 >Natural language parsing as statistical pattern recognition.
【24h】

Natural language parsing as statistical pattern recognition.

机译:自然语言解析作为统计模式识别。

获取原文
获取原文并翻译 | 示例

摘要

Traditional natural language parsers are based on rewrite rule systems developed in an arduous, time-consuming manner by grammarians. A majority of the grammarian's efforts are devoted to the disambiguation process, first hypothesizing rules which dictate constituent categories and relationships among words in ambiguous sentences, and then seeking exceptions and corrections to these rules.; In this work, I propose an automatic method for acquiring a statistical parser from a set of parsed sentences which takes advantage of some initial linguistic input, but avoids the pitfalls of the iterative and seemingly endless grammar development process. Based on distributionally-derived and linguistically-based features of language, this parser acquires a set of statistical decision trees which assign a probability distribution on the space of parse trees given the input sentence. These decision trees take advantage of significant amount of contextual information, potentially including all of the lexical information in the sentence, to produce highly accurate statistical models of the disambiguation process. By basing the disambiguation criteria selection on entropy reduction rather than human intuition, this parser development method is able to consider more sentences than a human grammarian can when making individual disambiguation rules.; In experiments between a parser, acquired using this statistical framework, and a grammarian's rule-based parser, developed over a ten-year period, both using the same training material and test sentences, the decision tree parser significantly outperformed the grammar-based parser on the accuracy measure which the grammarian was trying to maximize, achieving an accuracy of 78% compared to the grammar-based parser's 69%.
机译:传统的自然语言解析器基于语法专家以艰巨,费时的方式开发的重写规则系统。语法学家的大部分工作致力于消歧过程,首先假设规则,该规则规定歧义句子中单词的构成类别和关系,然后寻求对这些规则的例外和更正。在这项工作中,我提出了一种自动方法,该方法用于从一组已解析的句子中获取统计解析器,该方法利用了一些初始语言输入,但避免了语法迭代和看似无止境的语法开发过程的陷阱。基于语言的分布派生和基于语言的特征,此解析器获取一组统计决策树,这些统计决策树在给定输入句子的情况下在解析树的空间上分配概率分布。这些决策树利用大量上下文信息(可能包括句子中的所有词汇信息)来生成歧义消除过程的高度准确的统计模型。通过将消歧标准选择基于熵减少而不是人类的直觉,这种解析器开发方法在制定个体消歧规则时能够比人类语法学家考虑更多的句子。在使用该统计框架获取的解析器与使用语法和语法测试的语法学家(历时十年)之间进行的实验中,使用相同的培训材料和测试语句,决策树解析器在性能上明显优于基于语法的解析器。语法专家试图最大化的准确性度量,与基于语法的解析器的69%相比,实现了78%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号