【24h】

LexEQUAL: Supporting Multiscript Matching in Database Systems

机译:LexEQUAL:在数据库系统中支持多脚本匹配

获取原文
获取原文并翻译 | 示例

摘要

To effectively support today's global economy, database systems need to store and manipulate text data in multiple languages simultaneously. Current database systems do support the storage and management of multilingual data, but are not capable of querying or matching text data across different scripts. As a first step towards addressing this lacuna, we propose here a new query operator called LexEQUAL, which supports multiscript matching of proper names. The operator is implemented by first transforming matches in multiscript text space into matches in the equivalent phoneme space, and then using standard approximate matching techniques to compare these phoneme strings. The algorithm incorporates tunable parameters that impact the phonetic match quality and thereby determine the match performance in the multiscript space. We evaluate the performance of the LexEQUAL operator on a real multiscript names dataset and demonstrate that it is possible to simultaneously achieve good recall and precision by appropriate parameter settings. We also show that the operator run-time can be made extremely efficient by utilizing a combination of q-gram and database indexing techniques. Thus, we show that the LexEQUAL operator can complement the standard lexicographic operators, representing a first step towards achieving complete multilingual functionality in database systems.
机译:为了有效地支持当今的全球经济,数据库系统需要同时存储和处理多种语言的文本数据。当前的数据库系统确实支持多语言数据的存储和管理,但是不能跨不同的脚本查询或匹配文本数据。作为解决这一缺陷的第一步,我们在这里提出一个新的查询运算符LexEQUAL,它支持专有名称的多脚本匹配。通过首先将多脚本文本空间中的匹配转换为等效音素空间中的匹配,然后使用标准的近似匹配技术来比较这些音素字符串,来实现运算符。该算法结合了可调节参数,这些参数会影响语音匹配质量,从而确定多脚本空间中的匹配性能。我们在一个真实的多脚本名称数据集上评估LexEQUAL运算符的性能,并证明可以通过适当的参数设置同时实现良好的查全率和精度。我们还表明,通过结合使用q-gram和数据库索引技术,可以使操作员运行时非常高效。因此,我们表明LexEQUAL运算符可以补充标准词典词典运算符,代表着在数据库系统中实现完整的多语言功能的第一步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号