【24h】

An Automatic Filter for Non-Parallel Texts

机译:用于非并行文本的自动过滤器

获取原文

摘要

Numerous cross-lingual applications, including state-of-the-art machine translation systems, require parallel texts aligned at the sentence level. However, collections of such texts are often polluted by pairs of texts that are comparable but not parallel. Bitext maps can help to discriminate between parallel and comparable texts. Bitext mapping algorithms use a larger set of document features than competing approaches to this task, resulting in higher accuracy. In addition, good bitext mapping algorithms are not limited to documents with structural mark-up such as web pages. The task of filtering non-parallel text pairs represents a new application of bitext mapping algorithms.
机译:许多交叉应用程序,包括最先进的机器翻译系统,需要在句子级别对齐的并行文本。 然而,这些文本的集合通常是通过比较但不平行的对文本的污染。 Bitext地图可以帮助区分并行和可比较的文本。 BITEXT映射算法使用比竞争对手的方法一组更大的文档特征,从而提高了更高的准确性。 此外,良好的BITEXT映射算法不限于具有结构标记的文档,例如网页。 过滤非并行文本对的任务表示BITEXT映射算法的新应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号