首页> 外文会议>International Conference on Extending Database Technology >LexEQUAL: Supporting Multiscrlpt Matching in Database Systems
【24h】

LexEQUAL: Supporting Multiscrlpt Matching in Database Systems

机译:Lexequal:支持数据库系统中的MultiScrlpt匹配

获取原文

摘要

To effectively support today's global economy, database systems need to store and manipulate text data in multiple languages simultaneously. Current database systems do support the storage and management of multilingual data, but are not capable of querying or matching text data across different scripts. As a first step towards addressing this lacuna, we propose here a new query operator called LexEQUAL, which supports multiscript matching of proper names. The operator is implemented by first transforming matches in multiscript text space into matches in the equivalent phoneme space, and then using standard approximate matching techniques to compare these phoneme strings. The algorithm incorporates tunable parameters that impact the phonetic match quality and thereby determine the mutch performance in the multiscript space. We evaluate the performance of the LexEQUAL operator on a real multiscript names datasei and demonstrate that it is possible to simultaneously achieve good recall and precision by appropriate parameter settings. We also show that the operator run-time can be made extremely efficient by utilizing a combination of q-gmm and dniaba.se indexing techniques. Thus, we show that the LbxEQUAL operator can complement the standard lexicographic operators, representing a first step towards achieving complete multilingual functionality in database systems.
机译:为了有效地支持今天的全球经济,数据库系统需要同时存储和操纵多种语言的文本数据。当前数据库系统确实支持多语言数据的存储和管理,但不能在不同脚本上查询或匹配文本数据。作为解决这个LECUNA的第一步,我们在这里提出了一个名为Lexequal的新查询操作员,其支持适当名称的多标匹配。操作员通过首先将MultiPrict文本空间中的匹配转换为等效音素空间中的匹配,然后使用标准近似匹配技术来比较这些音素字符串。该算法包含影响语音匹配质量的可调参数,从而确定多标空间中的击碎性能。我们评估Lexequal运算符在真实的多标记名称数据上的性能,并证明可以通过适当的参数设置同时实现良好的召回和精度。我们还表明,通过利用Q-GMM和DNIBA.SE索引技术的组合,可以使操作员运行时间非常有效。因此,我们表明LBXEqual运算符可以补充标准的词典运算符,代表迈为在数据库系统中实现完整的多语言功能的第一步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号