首页> 外文会议>International Conference on Science in Information Technology >Dictionary Distribution Based on Number of Characters for Damerau-Levenshtein Distance Spell Checker Optimization
【24h】

Dictionary Distribution Based on Number of Characters for Damerau-Levenshtein Distance Spell Checker Optimization

机译:基于Damerau-Levenshtein距离拼写检查优化的字符数的字典分布

获取原文

摘要

Damerau-Levenshtein Distance is an algorithm that can solve word correction problems. This algorithm changes one word into another word using a specified set of edit operations. In word correction using Damerau-Levenshtein Distance, edit operations that can be performed are: substitution, insertion, deletion and transposition. However, the Damerau-Levenshtein Distance algorithm also has a weakness, which is a long processing time. In order for the system to be able to display word suggestions on the wrong string, the system must calculate the word with each word in the dictionary. The processing time will be longer if the dictionary used is very large, for example, the Indonesian Dictionary has more than 30,000 basic words. So that in this study, a dictionary distribution based on the number of characters to shorten the processing time. The use of a distributed dictionary speeds up the Damerau-Levenshtein Distance algorithm by 29.04 seconds.
机译:Damerau-Levenshtein距离是一种可以解决Word校正问题的算法。此算法使用指定的编辑操作组将一个单词更改为另一个单词。在使用Damerau-Levenshtein距离的单词校正中,可以执行的编辑操作是:替换,插入,删除和转换。然而,Damerau-Levenshtein距离算法也具有弱点,这是一个很长的处理时间。为了使系统能够在错误的字符串上显示Word建议,系统必须用字典中的每个单词计算单词。例如,如果使用的字典非常大,则处理时间将更长,例如,印度尼西亚字典具有超过30,000个基本单词。这样,在本研究中,基于字符数来缩短处理时间的字典分布。分布式字典的使用将Damerau-Levenshtein距离算法加速了29.04秒。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号