首页> 外文会议>International Conference on Data Management Technologies and Applications >Performance Evaluation of Phonetic Matching Algorithms on English Words and Street Names - Comparison and Correlation
【24h】

Performance Evaluation of Phonetic Matching Algorithms on English Words and Street Names - Comparison and Correlation

机译:语音匹配算法对英语单词和街道名称的性能评估 - 比较与相关性

获取原文

摘要

Researchers confront major problems while searching for various kinds of data in a large imprecise database, as they are not spelled correctly or in the way they were expected to be spelled. As a result, they cannot find the word they are looking for. Over the years of struggle, relying on pronunciation of words was considered to be one of the practices to solve the problem effectively. The technique used to acquire words based on sounds is known as "Phonetic Matching". Soundex is the first algorithm proposed and other algorithms like Metaphone, Caverphone, DMetaphone, Phonex etc., have been also used for information retrieval in different environments. This paper deals with the analysis and evaluation of different phonetic matching algorithms on several datasets comprising of street names of North Carolina and English dictionary words. The analysis clearly states that there is no clear best technique in general since Metaphone has the best performance for English dictionary words, while NYSIIS has better performance for datasets having street names. Though Soundex has high accuracy in correcting the misspelled words compared to other algorithms, it has lower precision due to more noise in the considered arena. The experimental results paved way for introducing some suggestions that would aid to make databases more concrete and achieve higher data quality.
机译:研究人员在一个大型不精确数据库中寻找各种数据时面临着重大问题,因为它们没有正确拼写或以预期拼写的方式拼写。结果,他们找不到他们正在寻找的词。多年来一直在奋斗,依赖于语言的发音被认为是有效解决问题的实践之一。用于基于声音获取单词的技术称为“语音匹配”。 Soundex是所提出的算法和其他算法,如间谍,Caverphone,Dmetaphone,Phonex等,也用于不同环境中的信息检索。本文涉及多个数据集的不同语音匹配算法的分析和评估,包括北卡罗来纳州和英语词典单词的街道名称。该分析清楚地说明了一般没有明确的技术,因为Metaphone对英语词典单词的表现最佳,而Nysiis对具有街道名称的数据集具有更好的性能。虽然Soundex在与其他算法相比校正拼写错误的单词时具有高精度,但由于所考虑的竞技场中的噪音更多,它具有较低的精度。实验结果铺平了一些建议,这有助于使数据库更加混凝土并实现更高的数据质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号