【24h】

Enlarging Paraphrase Collections through Generalization and Instantiation

机译:通过泛化和实例化扩大释义集合

获取原文

摘要

This paper presents a paraphrase acquisition method that uncovers and exploits generalities underlying paraphrases: paraphrase patterns are first induced and then used to collect novel instances. Unlike existing methods, ours uses both bilingual parallel and monolingual corpora. While the former are regarded as a source of high-quality seed paraphrases, the latter are searched for paraphrases that match patterns learned from the seed paraphrases. We show how one can use monolingual corpora, which are far more numerous and larger than bilingual corpora, to obtain paraphrases that rival in quality those derived directly from bilingual corpora. In our experiments, the number of paraphrase pairs obtained in this way from monolingual corpora was a large multiple of the number of seed paraphrases. Human evaluation through a paraphrase substitution test demonstrated that the newly acquired paraphrase pairs are of reasonable quality. Remaining noise can be further reduced by filtering seed paraphrases.
机译:本文介绍了一种释义获取方法,该方法揭示并利用了释义的一般性:首先诱发释义模式,然后将其用于收集新实例。与现有方法不同,我们使用双语并行和单语语料库。前者被认为是高质量种子复述的来源,而后者则是寻找与从种子复述中学到的模式相匹配的复述。我们展示了如何使用单语语料库(比双语语料库要多得多和更大)来获得与直接来自双语语料库的产品相媲美的释义。在我们的实验中,以这种方式从单语语料库获得的复述词对的数量是种子复述词的数量的很大倍数。通过复述替换测试的人类评估表明,新获得的复述对具有合理的质量。可以通过过滤种子复述来进一步减少剩余的噪声。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号