Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape

机译：基于语言模型，发音和形状的中文拼写错误检测与纠正

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Spelling check is an important preprocessing task when dealing with user generated texts such as tweets and product comments. Compared with some western languages such as English, Chinese spelling check is more complex because there is no word delimiter in Chinese written texts and misspelled characters can only be determined in word level. Our system works as follows. First, we use character-level n-gram language models to detect potential misspelled characters with low probabilities below some predefined threshold. Second, for each potential incorrect character, we generate a candidate set based on pronunciation and shape similarities. Third, we filter some candidate corrections if the candidate cannot form a legal word with its neighbors according to a word dictionary. Finally, we find the best candidate with highest language model probability. If the probability is higher than a predefined threshold, then we replace the original character; or we consider the original character as correct and take no action. Our preliminary experiments shows that our simple method can achieve relatively high precision but low recall.

机译：在处理用户生成的文本（例如推文和产品评论）时，拼写检查是一项重要的预处理任务。与英文等西方语言相比，中文拼写检查更为复杂，因为中文书面文本中没有单词定界符，而拼写错误的字符只能在单词级别上确定。我们的系统工作如下。首先，我们使用字符级n-gram语言模型来检测具有低于某些预定义阈值的低概率的潜在拼写错误的字符。其次，对于每个潜在的不正确字符，我们根据发音和形状相似性生成候选集。第三，如果候选人无法根据单词词典与邻居形成合法单词，我们会过滤一些候选人更正。最后，我们找到具有最高语言模型概率的最佳人选。如果概率高于预定义的阈值，则我们替换原始字符；否则我们认为原始字符正确无误。我们的初步实验表明，我们的简单方法可以实现较高的精度，但召回率较低。

著录项

来源
《CIPS-SIGHAN joint conference on Chinese language processing》|2014年|220-223|共4页
会议地点 Wuhan(CN)
作者
Junjie Yu; Zhenghua Li;
展开▼
作者单位

Provincial Key Laboratory for Computer Information Processing Technology Soochow University China;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Improving post-processing optical character recognition documents with Arabic language using spelling error detection and correction [J] . lyad Abu Doush, Ahmed M. Al-Trad International journal of reasoning-based intelligent systems . 2016,第3a4期

机译：使用拼写错误检测和更正来改进阿拉伯语后处理光学字符识别文档
2. Pronunciation error detection for computer-assisted language learning system based on error rule clustering using a decision tree [J] . Akinori Ito, Motoyuki Suzuki, Shozo Makino, Acoustical science and technology . 2007,第2期

机译：基于决策树的错误规则聚类的计算机辅助语言学习系统的语音错误检测
3. Sentence Level N-Gram Context Feature in Real-Word Spelling Error Detection and Correction: Unsupervised Corpus Based Approach [J] . Tsegay Mullu Kassa Journal of Information Engineering and Applications . 2020,第4期

机译：句子级别n-gram上下文特征在实际单词拼写错误检测和校正中：基于无监督的语料库方法
4. Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape [C] . Junjie Yu, Zhenghua Li CIPS-SIGHAN joint conference on Chinese language processing . 2012

机译：基于语言模型的拼写错误检测和校正，发音和形状
5. A study of spelling errors in word processing: Detection and correction. [D] . Diaz-Figueroa, Maria I. 2007

机译：文字处理中的拼写错误研究：检测和更正。
6. End-to-End Automatic Pronunciation Error Detection Based on Improved Hybrid CTC/Attention Architecture [O] . Long Zhang, Ziping Zhao, Chunmei Ma, 2020

机译：基于改进的混合CTC /注意架构的端到端自动语音错误检测
7. Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape [O] . Junjie Yu, Zhenghua Li 2015

机译：基于语言模型，语音和形状的汉语拼写错误检测与校正
8. SPEEDCOP: Automatic Spelling Error Detection and Correction for Large Data Bases [R] . Pollock, J. J. 1981

机译：spEEDCOp：大型数据库的自动拼写错误检测和纠正

Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅