【24h】

On the Costs of Multilingualism in Database Systems

机译:数据库系统中多种语言的成本

获取原文
获取原文并翻译 | 示例

摘要

Database engines are well-designed for storing and processing text data based on Latin scripts. But in today's global village, databases should ideally support multilingual text data equally efficiently. While current database systems do support management of multilingual data, we are not aware of any prior studies that compare and quantify their performance in this regard. In this paper, we first compare the multilingual functionality provided by a suite of popular database systems. We find that while the systems support most SQL-defined multilingual functionality, some needed features are not yet implemented. We then profile their performance in handling text data in ISO:8859, the standard database character set, and in Unicode, the multilingual character set. Our experimental results indicate significant performance degradation while handling multilingual data in these database systems. Worse, we find that the query optimizer's accuracy is different between standard and multilingual data types. As a first step towards alleviating the above problems,we propose Cuniform, a compressed format that is trivially convertible to Unicode. Our initial experimental results with Cuniform indicate that it largely eliminates the performance degradation for multilingual scripts with small repertoires. Further, the Cuniform format can elegantly support extensions to SQL for multi-lexical text processing.
机译:数据库引擎经过精心设计,可以基于拉丁脚本存储和处理文本数据。但是,在当今的全球村庄中,理想情况下,数据库应该同样有效地支持多语言文本数据。尽管当前的数据库系统确实支持多语言数据的管理,但我们尚不了解任何先前的研究可以在这方面对它们的性能进行比较和量化。在本文中,我们首先比较一套流行的数据库系统提供的多语言功能。我们发现,尽管系统支持大多数SQL定义的多语言功能,但某些必需的功能尚未实现。然后,我们描述它们在处理ISO:8859(标准数据库字符集)和Unicode(多语言字符集)中的文本数据方面的性能。我们的实验结果表明,在这些数据库系统中处理多语言数据时,性能显着下降。更糟糕的是,我们发现标准和多语言数据类型之间查询优化器的准确性是不同的。作为减轻上述问题的第一步,我们建议使用Cuniform,这是一种可轻松转换为Unicode的压缩格式。我们使用Cuniform进行的初步实验结果表明,它可以很大程度上消除具有小曲目的多语言脚本的性能下降。此外,Cuniform格式可以优雅地支持对SQL的扩展,以进行多词汇文本处理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号