【24h】

An Automatic Filter for Non-Parallel Texts

机译:自动过滤非并行文本

获取原文
获取外文期刊封面目录资料

摘要

Numerous cross-lingual applications, including state-of-the-art machine translation systems, require parallel texts aligned at the sentence level. However, collections of such texts are often polluted by pairs of texts that are comparable but not parallel. Bitext maps can help to discriminate between parallel and comparable texts. Bitext mapping algorithms use a larger set of document features than competing approaches to this task, resulting in higher accuracy. In addition, good bitext mapping algorithms are not limited to documents with structural mark-up such as web pages. The task of filtering non-parallel text pairs represents a new application of bitext mapping algorithms.
机译:许多跨语言应用程序,包括最新的机器翻译系统,都需要在句子级别对齐的并行文本。但是,此类文本的集合经常被可比但不平行的成对文本污染。双文本映射可以帮助区分平行文本和可比较文本。与竞争性方法相比,双文本映射算法使用更多的文档功能集,从而提高了准确性。另外,好的bitext映射算法不限于具有结构标记的文档,例如网页。过滤非并行文本对的任务代表了bitext映射算法的新应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号