首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Bitext Dependency Parsing With Auto-Generated Bilingual Treebank
【24h】

Bitext Dependency Parsing With Auto-Generated Bilingual Treebank

机译:自动生成的双语树库的双文本相关性解析

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a method to improve the accuracy of bilingual texts (bitexts) dependency parsing by using an auto-generated bilingual treebank created with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are costly and troublesome to obtain. In the proposed method, we use an auto-generated bilingual treebank to train the parsing models. First, an SMT system is used to translate a monolingual treebank into the target language; then, a monolingual parser for the target language is used to parse the translated sentences. Since the auto-translated sentences and auto-parsed trees in the auto-generated bilingual treebank are far from perfect, the bilingual constraints are not sufficiently reliable. To overcome this problem, we propose a method to verify the reliability of the constraints using a large amount of target monolingual and bilingual unannotated data. Finally, we design a set of effective bilingual features for parsing models on the basis of the verified constraints. We conduct the experiments using a standard test data. The experimental results show that our bitext parser significantly outperforms monolingual parsers. Moreover, our method is still able to provide improvement when we use a larger monolingual treebank containing over 50 000 sentences. We also test the proposed method with different SMT systems and the results show that our method is very robust to the noise. In particular, the proposed method can be used in a purely monolingual setting with the help of SMT. That is, it does not need the human translation of the test set as previous methods do.
机译:本文提出了一种方法,该方法通过使用借助统计机器翻译(SMT)系统创建的自动生成的双语树库来提高双语文本(bitexts)依赖项解析的准确性。先前的bitext解析方法使用人工注释的双语树库,这些树库昂贵且难以获取。在提出的方法中,我们使用自动生成的双语树库来训练解析模型。首先,使用SMT系统将单语树库转换为目标语言。然后,使用目标语言的单语解析器来解析翻译后的句子。由于自动生成的双语树库中的自动翻译的句子和自动解析的树还很不完善,因此双语约束不够可靠。为了克服这个问题,我们提出了一种使用大量目标单语和双语未注释数据来验证约束的可靠性的方法。最后,我们根据已验证的约束条件设计了一套有效的双语功能,用于解析模型。我们使用标准测试数据进行实验。实验结果表明,我们的bitext解析器明显优于单语解析器。此外,当我们使用包含超过50 000个句子的较大的单语树库时,我们的方法仍然能够提供改进。我们还用不同的SMT系统测试了该方法,结果表明我们的方法对噪声非常鲁棒。特别是,借助SMT,可以在纯单语言环境中使用所提出的方法。也就是说,它不需要像以前的方法那样人工翻译测试集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号