首页> 外文学位 >Well-foundedness and reliability in statistical natural language parsing.
【24h】

Well-foundedness and reliability in statistical natural language parsing.

机译:统计自然语言解析的充分依据和可靠性。

获取原文
获取原文并翻译 | 示例

摘要

Statistical techniques have revolutionized all areas of Natural Language Processing, and syntactic parsing is no exception. The availability of large syntactically annotated corpora (principally through the Penn Treebank project) has precipitated parsing's shift from the task of constructing interpretations to the task of constructing a labeled bracketing.; These corpus-based techniques are robust and scalable, two desiderata lacking in early, knowledge-based approaches to parsing. The early approaches are typified by parsers that could operate only in a narrow domain, but that produced semantically interpretable parses. In contrast, the corpus-based approaches produce underspecified labeled bracketings that are not sufficiently detailed for applications in Natural Language Understanding.; In this dissertation we describe a parser that uses hand-written, linguistically informed knowledge sources (grammar, lexicon, ontology) to enrich the labeled bracketing in the Penn Treebank. The enriched corpus is then used as the data source for statistical parsing in our well-founded framework. Furthermore, parsing in this framework supports a fully-lexicalized parsing model, and allows for the natural integration of word sense disambiguation with syntactic disambiguation. We show that jointly modeling word sense ambiguity and syntactic ambiguity results in improved syntactic disambiguation. We also describe our treatment of coordinated structures (a topic generally ignored in statistical parsing), and our novel method for using an ontology to settle on backed-off estimators via hypothesis testing.
机译:统计技术彻底改变了自然语言处理的所有领域,语法分析也不例外。大型的带有语法注释的语料库的使用(主要是通过Penn Treebank项目)已经促使解析从构造解释的任务转移到构造标记的括号的任务的转变。这些基于语料库的技术强大且可扩展,这是早期基于知识的解析方法所缺乏的两个需求。早期方法的典型特征是只能在狭窄域中运行的解析器,但是生成了语义上可解释的解析器。相比之下,基于语料库的方法会产生未指定的带标签的括号,而这些括号对于自然语言理解的应用而言不够详细。在本文中,我们描述了一种解析器,该解析器使用手写的,语言学上已知的知识源(语法,词典,本体)丰富了Penn Treebank中的带标签的括号。然后,经过充实的语料库将在我们有充分根据的框架中用作统计分析的数据源。此外,此框架中的解析支持完全词法化的解析模型,并允许自然地将词义歧义消除与语法歧义消除集成。我们表明,共同建模词义歧义和句法歧义会改善句法歧义。我们还描述了对协调结构的处理(在统计分析中通常会忽略的一个主题),以及通过假设检验使用本体论解决后退估计量的新颖方法。

著录项

  • 作者

    Seagull, Amon B.;

  • 作者单位

    The University of Rochester.;

  • 授予单位 The University of Rochester.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2000
  • 页码 139 p.
  • 总页数 139
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号