首页> 外文会议>Workshop on machine translation and parsing in indian languages >Automatic Annotation of Genitives in Hindi Treebank
【24h】

Automatic Annotation of Genitives in Hindi Treebank

机译:印地语树库中的同义语自动注释

获取原文
获取原文并翻译 | 示例

摘要

Noun with genitive marker in Indo-Aryan language can variously be a child of a noun, a verb or a complex predicate, thus making it an important parsing issue. In this paper, we examine genitive data of Hindi and aim to automatically determine the attachment and relational label of the same in a dependency framework. We implement two approaches: a rule based approach and a statistical approach. The rule based approach produces promising results but fails to handle certain constructions because of its greedy selection. The statistical approach overcomes this by using a single candidate approach that considers all the possible candidates for the head and chooses the most probable candidate among them. Both approaches are applied on controlled and open environment data. A Controlled environment refers to the situation when the relational labels are attested to the input data except for the genitive data; while open environment refers to cases in which the input is only POS tagged and chunked. The rule based and statistical systems produce a high accuracy of 95% and 97% respectively for attachment and perform considerably well for labeling in controlled environment but poorly in open environment.
机译:在印度-雅利安语中带有属格标记的名词可以是名词,动词或复杂谓词的子代,因此使其成为重要的解析问题。在本文中,我们检查了印地语的遗传数据,旨在自动确定其在依赖框架中的附着和关系标签。我们实现两种方法:基于规则的方法和统计方法。基于规则的方法产生了可喜的结果,但是由于贪婪的选择而无法处理某些构造。统计方法通过使用单个候选方法克服了这一问题,该方法考虑了所有可能的头部候选,并从中选择最可能的候选。两种方法都适用于受控和开放环境数据。受控环境是指除了生成数据之外,还针对输入数据证明了关系标签的情况;开放环境是指输入仅经过POS标记和分块的情况。基于规则的统计系统产生的附件准确率分别为95%和97%,在受控环境中标记效果相当好,在开放环境中则差强人意。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号