首页> 外文会议>International Conference on Advanced Language Processing and Web Information Technology >Automatic Spelling Correction Rule Extraction and Application for Spoken-style Korean Text
【24h】

Automatic Spelling Correction Rule Extraction and Application for Spoken-style Korean Text

机译:自动拼写校正规则提取与口语式韩语文本的应用

获取原文

摘要

Nowadays, spoken-style text is prevailing because lots of information are being written in spoken-style such as Short-Message-Service (SMS) messages. However, the spoken-style text contains more spelling errors than the traditional written-style text. In this paper, we propose a rule-based spelling correction system which can automatically extract spelling correction rules from the correction corpus and apply extracted rules to spelling errors of input sentences. In order to preserve both high precision and high recall, we devise a candidate-elimination algorithm which determines appropriate context size of spelling correction rules based on rule accuracy. Experimental results showed that the proposed system can extract 42,537 spelling correction rules and apply the rules to correct spelling errors on the test corpus and thus, the rate of precision is increased from 31.08% to 79.04% on the basis of message unit.
机译:如今,口语式文本是普遍的,因为大量信息是用短信 - 服务(SMS)消息的说话方式编写。但是,口语样式文本包含比传统的书面文本更多的拼写错误。在本文中,我们提出了一种基于规则的拼写校正系统,可以从校正语料库中自动提取拼写校正规则,并将提取的规则应用于输入句子的拼写错误。为了保留高精度和高召回,我们设计了一种候选消除算法,该算法基于规则准确性确定拼写校正规则的适当上下文规则。实验结果表明,建议的系统可以提取42,537拼写规则,并应用规则在测试语料库上纠正拼写错误,因此,基于消息单元,精度从31.08%增加到79.04%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号