首页> 外文学位 >Towards efficient statistical parsing using lexicalized grammatical information.
【24h】

Towards efficient statistical parsing using lexicalized grammatical information.

机译:使用词汇化的语法信息进行有效的统计解析。

获取原文
获取原文并翻译 | 示例

摘要

Many natural language understanding systems require efficient and accurate parsing disambiguation to be effective. State of the art parsers owe their high performance in large part to statistical modeling of lexical features. Although lexicalized tree adjoining grammar (TAG) is a lexicalized grammatical formalism for natural language, its use in statistical parsing has remained relatively unexplored. In this work, I aim to develop statistical models for TAG parsing that are both efficient and accurate. First, I explore the issue of linear time TAG parsing disambiguation (supertagging). Previously, only local structural information was found to be effective for supertag disambiguation. I show that long distance information as well as lexical information can also be useful for accurate supertagging. Furthermore, I develop frameworks that use these features to significantly increase the accuracy of supertagging. Second, in order to provide a robust resource for statistical processing models of TAG, I develop and evaluate procedure to extract TAGS from widely available treebanks. I then develop other procedures to organize these extracted TAGS as well as to link them to other TAGs. Third, I explore smoothing approaches for TAG, which is essential because of the inherent data sparseness problem for statistical processing models of TAG. One main approach uses the idea of distributional similarity in smoothing while another approach uses the large scale organization of TAG for smoothing. Both show promise for smoothing statistical processing models of TAG.
机译:许多自然语言理解系统要求有效且准确的解析歧义才能有效。最先进的解析器的高性能很大程度上归功于词汇特征的统计建模。尽管词汇化树形邻接语法(TAG)是自然语言的词汇化语法形式主义,但其在统计分析中的使用仍相对未开发。在这项工作中,我旨在开发既高效又准确的TAG解析统计模型。首先,我探讨线性时间TAG解析歧义消除(超级标记)的问题。以前,仅发现本地结构信息可有效消除超级标签歧义。我证明了远距离信息以及词汇信息对于准确的超级标记也很有用。此外,我开发了使用这些功能的框架来显着提高超级标记的准确性。其次,为了为TAG的统计处理模型提供强大的资源,我开发并评估了从广泛可用的树库中提取TAGS的过程。然后,我开发其他程序来组织这些提取的TAGS并将它们链接到其他TAG。第三,我探索了TAG的平滑方法,这是必不可少的,因为TAG的统计处理模型存在固有的数据稀疏性问题。一种主要方法是在平滑中使用分布相似性,而另一种方法是使用TAG的大规模组织进行平滑。两者都显示出有望使TAG的统计处理模型更加平滑。

著录项

  • 作者

    Chen, John.;

  • 作者单位

    University of Delaware.;

  • 授予单位 University of Delaware.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2002
  • 页码 275 p.
  • 总页数 275
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号