首页> 外文会议>2011 International Conference on Advanced Computer Science and Information Systems >Application of document spelling checker for Bahasa Indonesia
【24h】

Application of document spelling checker for Bahasa Indonesia

机译:文档拼写检查器在印尼语中的应用

获取原文
获取原文并翻译 | 示例

摘要

The needs of document spelling checker of Bahasa Indonesia is highly required. Unfortunately, there is no available application of document spelling checker for Bahasa Indonesia. The existing researches on Indonesian spelling checker have not developed into a complete document spelling checker. Here in this research, we compare several methods employed for Indonesian spelling checker especially in the word error detection and analyzed best methods employed in the building of Indonesian document spelling checker application. The main idea is to employ a complete word list as the reference. The Indonesian document spelling checker consists of 5 main components, namely document preprocess, word error detection, word error correction, word candidate ranking, and user feedback. The document preprocess is to process the document into a list of unique word which will be analyzed further in the spelling checker. In the word error detection, a binary search and hashing are used to do the searching faster. In the word error correction, the forward reverse and a similarity measure score are employed. In the candidate ranking, HMM is used to select the best correct word candidate. Using 13,000 words as the lexicon resource and 10 documents as the tested documents, the experimental results achieved 93.7% accuracy. The errors are caused by the word absence in the lexicon resource and the special repetition word form.
机译:印尼语的文档拼写检查器的需求非常迫切。不幸的是,印度尼西亚语没有文档拼写检查器的可用应用程序。印尼语拼写检查器的现有研究尚未发展为完整的文档拼写检查器。在本研究中,我们比较了印度尼西亚拼写检查器使用的几种方法,特别是在单词错误检测中,并分析了印度尼西亚文档拼写检查器应用程序构建中使用的最佳方法。主要思想是采用完整的单词表作为参考。印尼文文档拼写检查器由5个主要组件组成,即文档预处理,单词错误检测,单词错误纠正,单词候选者排名和用户反馈。文档预处理是将文档处理为唯一单词列表,将在拼写检查器中对其进行进一步分析。在单词错误检测中,二进制搜索和哈希用于更快地进行搜索。在单词纠错中,采用正向反向和相似性度量得分。在候选者排名中,HMM用于选择最佳正确单词候选者。使用13,000个单词作为词典资源,并使用10个文档作为测试文档,实验结果达到了93.7%的准确率。错误是由词典资源中缺少单词和特殊重复单词形式引起的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号