首页> 外文会议>9th International conference on language resources and evaluation >HamleDT 2.0: Thirty Dependency Treebanks Stanfordized
【24h】

HamleDT 2.0: Thirty Dependency Treebanks Stanfordized

机译:HamleDT 2.0:斯坦福化了三十个依赖树库

获取原文

摘要

We present HamleDT 2.0 (HArmonized Multi-LanguagE Dependency Treebank). HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a treebank annotation style that became popular in recent years. We use the newest basic Universal Stanford Dependencies, without added language-specific subtypes. We describe both of the annotation styles, including adjustments that were necessary to make, and provide details about the conversion process. We also discuss the differences between the two styles, evaluating their advantages and disadvantages, and note the effects of the differences on the conversion. We regard the stanfordization as generally successful, although we admit several shortcomings, especially in the distinction between direct and indirect objects, that have to be addressed in future. We release part of HamleDT 2.0 freely; we are not allowed to redistribute the whole dataset, but we do provide the conversion pipeline.
机译:我们介绍了HamleDT 2.0(协调的多语言依赖树库)。 HamleDT 2.0是一个由30个现有树库组成的集合,这些树库被统一为一个通用的注释样式,即布拉格依赖关系,并进一步转换为斯坦福依赖关系,这是一种近年来流行的树形注释风格。我们使用最新的基本通用斯坦福依赖关系,而没有添加特定于语言的子类型。我们描述了两种注释样式,包括进行必要的调整,并提供了有关转换过程的详细信息。我们还将讨论两种样式之间的差异,评估它们的优缺点,并注意差异对转换的影响。我们认为斯坦福化总体上是成功的,尽管我们承认有一些缺点,尤其是在直接对象和间接对象之间的区别,这些缺点将来必须解决。我们免费发布HamleDT 2.0的一部分;我们不允许重新分配整个数据集,但是我们提供了转换管道。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号