首页> 外文会议>Annual meeting of the Association for Computational Linguistics >PARANMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations
【24h】

PARANMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations

机译:PARANMT-50M:通过数百万个机器翻译来突破准句子句嵌入的极限

获取原文

摘要

We describe ParaNMT-50M. a dataset of more than 50 million English-English sentential paraphrase pairs. We generated the pairs automatically by using neural machine translation to translate the non-English side of a large parallel corpus, following Wieting et al. (2017). Our hope is that PARANMT-50M can be a valuable resource for paraphrase generation and can provide a rich source of semantic knowledge to improve downstream natural language understanding tasks. To show its utility, we use PARANMT-50M to train paraphrastic sentence embeddings that outperform all supervised systems on every SemEval semantic textual similarity competition, in addition to showing how it can be used for paraphrase generation.
机译:我们描述了ParaNMT-50M。超过5,000万英英句子释义对的数据集。我们遵循Wieting等人的方法,通过使用神经机器翻译来翻译大型平行语料库的非英语面来自动生成对。 (2017)。我们希望PARANMT-50M可以成为释义生成的宝贵资源,并且可以提供丰富的语义知识来改善下游自然语言理解任务。为了展示其实用性,我们在演示SemEval语义文本相似性竞赛中,使用PARANMT-50M训练了比所有受监管系统都要好的监督短语嵌入,并展示了如何将其用于释义生成。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号