【24h】

Building a free, general-domain paraphrase database for Japanese

机译:为日语构建一个免费的通用域释义数据库

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Previous works have used parallel corpora and alignment techniques from phrase-based statistical machine translation to extract and generate paraphrases. In Japanese, paraphrases for a number of paraphrase categories or domains have been extracted by this method. However, most of these resources focus on a particular phenomenon in Japanese, and there are still no Japanese paraphrase resources that cover all varieties of phrases from several domains, and are freely available. In addition, because Japanese and English vary in grammar and word ordering, we perform syntax-based preprocessing to reduce this mismatch and extract paraphrases similar in quality to those extracted using more similar language pairs. The data used in creating the Japanese paraphrases is either in the public domain, or available under the Creative Commons license, and spans a variety of genres for wide coverage.
机译:先前的作品使用了基于语料的统计机器翻译中的平行语料库和对齐技术来提取和生成复述。在日语中,已通过此方法提取了许多复述类别或域的复述。但是,这些资源大多数都集中在日语中的特定现象上,并且仍然没有日语释义资源可以涵盖来自多个领域的所有短语,并且可以免费获得。此外,由于日语和英语在语法和单词顺序方面有所不同,因此我们执行基于语法的预处理以减少这种不匹配,并提取质量与使用更多相似语言对提取的措词相似的释义。用于创建日语释义的数据可以在公共领域使用,也可以在知识共享许可下使用,并且涵盖各种类型,可以广泛使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号