首页> 外文期刊>Mathematics in computer science >Improved Alignment-Based Algorithm for Multilingual Text Compression
【24h】

Improved Alignment-Based Algorithm for Multilingual Text Compression

机译:改进的基于对齐的多语言文本压缩算法

获取原文
获取原文并翻译 | 示例
           

摘要

Multilingual text compression exploits the existence of the same text in several languages to compress the second and subsequent copies by reference to the first. This is done based on bilingual text alignment, a mapping of words and phrases in one text to their semantic equivalents in the translation. A new multilingual text compression scheme is suggested, which improves over an immediate generalization of bilingual algorithms. The idea is to store the necessary markup data within the source language text; the incurred compression loss due to this overhead is smaller than the savings in the compressed target language texts, for a large enough number of the latter. Experimental results are presented for a parallel corpus in six languages extracted from the EUR-Lex website of the European Union. These results show the superiority of the new algorithm as a function of the number of languages.
机译:多语言文本压缩利用几种语言中相同文本的存在来通过引用第一种和第二种压缩后续的副本。这是基于双语文本对齐,将一个文本中的单词和短语映射到翻译中的语义对等关系而完成的。提出了一种新的多语言文本压缩方案,该方案改进了对双语算法的立即推广。这个想法是在源语言文本中存储必要的标记数据。对于足够多的目标语言文本,由于这种开销而导致的压缩损失小于压缩目标语言文本的节省。从欧盟的EUR-Lex网站提取了六种语言的平行语料库的实验结果。这些结果表明新算法作为语言数量的函数的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号