首页> 外文学位 >Pronominal anaphora resolution in Chinese.
【24h】

Pronominal anaphora resolution in Chinese.

机译:汉语的代词照应解析。

获取原文
获取原文并翻译 | 示例

摘要

Resolving pronominal anaphors in English has been a focus of research in natural language processing for decades. Methods ranging from linguistics-oriented, rule-based approaches to data-oriented, machine-learning approaches have been applied to the problem of finding the antecedents of pronouns.; In contrast to the abundance of research in English, there is almost no work on the problem in Chinese. This thesis addresses that gap.; Both a rule-based and a machine-learning anaphora resolution approach are presented in this work. An important difference between Chinese and English is that Chinese, unlike English, is a pro-drop language, and has null (zero) pronouns. The rule-based approach is applied to resolving these null pronouns as well as to the overt, third-person pronouns.; The Hobbs algorithm is used for the rule-based method of anaphora resolution. Three versions of the algorithm are presented. The first uses only syntactic structure to select an antecedent. The second uses limited number and gender agreement, while the third incorporates semantic constraints on the proposed antecedents.; For the machine-learning method, maximum entropy, supervised machine-learning models are used. Different models were trained using sets of features that paralleled the information sources used by the different versions of the Hobbs algorithm.; Two sets of data were used. The Penn Chinese Treebank provided the test data for resolution of both overt, third-person pronouns and of zero pronouns. The CTB parses were annotated for coreference using guidelines that were drawn up for the work presented here. Data annotated for the 2004 Chinese ACE program were used for training and testing the maximum entropy models to find the antecedents for overt, third-person pronouns.; The results from experiments with the two basic methods using the different levels of linguistic information will be presented and discussed.
机译:解决英语中的代词照应是数十年来自然语言处理研究的重点。从以语言学为基础的基于规则的方法到以数据为导向的机器学习方法的方法已经应用于寻找代词的先行问题。与大量的英语研究相比,中文问题几乎没有研究。本论文解决了这一差距。这项工作既提供了基于规则的学习方式又提供了机器学习的回指解决方法。汉语和英语之间的重要区别在于,与英语不同,汉语是亲语言,并具有空(零)代词。基于规则的方法适用于解析这些空代词以及明显的第三人称代词。 Hobbs算法用于基于规则的回指解析方法。提出了算法的三个版本。第一个仅使用语法结构来选择一个先行词。第二种使用有限的数量和性别协议,而第三种则在拟议的前件中加入了语义约束。对于机器学习方法,使用最大熵,监督的机器学习模型。使用与Hobbs算法的不同版本使用的信息源平行的特征集来训练不同的模型。使用了两组数据。宾州中文树库提供了用于显式第三人称代词和零代词解析的测试数据。使用针对此处介绍的工作制定的指南,对CTB解析进行了注释,以供共同参考。使用2004年中文ACE程序注释的数据来训练和测试最大熵模型,以找到明显的第三人称代词的先行词。将介绍和讨论使用两种不同语言信息水平的基本方法进行的实验结果。

著录项

  • 作者

    Converse, Susan P.;

  • 作者单位

    University of Pennsylvania.;

  • 授予单位 University of Pennsylvania.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 140 p.
  • 总页数 140
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号