首页> 外文期刊>ACM transactions on Asian language information processing >Integrating Multiple Dependency Corpora for Inducing Wide-Coverage Japanese CCG Resources
【24h】

Integrating Multiple Dependency Corpora for Inducing Wide-Coverage Japanese CCG Resources

机译:集成多依赖语料库以诱导大范围的日本CCG资源

获取原文
获取原文并翻译 | 示例
       

摘要

A novel method to induce wide-coverage Combinatory Categorial Grammar (CCG) resources for Japanese is proposed in this article. For some languages including English, the availability of large annotated corpora and the development of data-based induction of lexicalized grammar have enabled deep parsing, i.e., parsing based on lexicalized grammars. However, deep parsing for Japanese has not been widely studied. This is mainly because most Japanese syntactic resources are represented in chunk-based dependency structures, while previous methods for inducing grammars are dependent On tree corpora. To translate syntactic information presented in chunk-based dependencies to phrase structures as accurately as possible, integration of annotation from multiple dependency-based corpora is proposed. Our method first integrates dependency structures and predicate-argument information and converts them into phrase structure trees. The trees are then transformed into CCG derivations in a similar way to previously proposed methods. The quality of the conversion is empirically evaluated in terms of the coverage of the obtained CCG lexicon and the accuracy of the parsing with the grammar. While the transforming process used in this study is specialized for Japanese, the framework of our method would be applicable to other languages for which dependency-based analysis has been regarded as more appropriate than phrase structure-based analysis due to morphosyn-tactic features.
机译:本文提出了一种新的诱导日语大范围组合分类语法(CCG)资源的方法。对于包括英语在内的某些语言,大型带注释语料库的可用性以及基于数据的词法化语法归纳的发展使得能够进行深度解析,即基于词法化语法进行解析。但是,日语的深度解析尚未得到广泛研究。这主要是因为大多数日语句法资源都以基于块的依赖结构表示,而先前的诱导语法的方法则依赖于树语料库。为了尽可能准确地将基于块的依存关系中表示的语法信息转换为短语结构,提出了对来自多个基于依存关系的语料库的注释的集成。我们的方法首先将依赖结构和谓词参数信息集成在一起,然后将它们转换为短语结构树。然后以类似于先前提出的方法的方式将树转换为CCG派生。根据获得的CCG词典的覆盖范围和语法分析的准确性,经验性地评估转换的质量。虽然本研究中使用的转换过程专门针对日语,但是我们的方法框架将适用于其他语言,由于基于形态同义词的特征,基于依赖的分析比基于短语结构的分析更适合于这些语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号