Chinese Spelling Errors Detection Based on CSLM

机译：基于CSLM的中文拼写错误检测

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Spelling errors are very common in various electronic documents and it leads to serious influence sometimes. To solve this problem, methods based on the n-gram language model are the most commonly used. CSLM (continuous space language model) which represents a word as a vector is different from traditional models. In this paper, we experimented with a specific CSLM, namely, the CBOW (Continuous Bag-of-Words) model, to detect spelling errors. Since spelling errors are usually considered as wrong characters rather than words in Chinese language, we trained character vectors with a large Chinese corpus, and then judged a Chinese character is right or not by its probability of the occurrence in a given context. Experimental results show that the method based on CSLM outperforms the n-gram language model.

机译：拼写错误在各种电子文档中非常常见，有时会造成严重影响。为了解决该问题，最常用的是基于n-gram语言模型的方法。将单词表示为向量的CSLM（连续空间语言模型）与传统模型不同。在本文中，我们尝试了一种特定的CSLM，即CBOW（连续词袋）模型，以检测拼写错误。由于拼写错误通常被认为是错误的字符而不是中文单词，因此我们训练了带有大型中文语料库的字符向量，然后根据给定上下文中出现汉字的概率来判断汉字是否正确。实验结果表明，基于CSLM的方法优于n-gram语言模型。

著录项

来源
《2015 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology》|2015年|173-176|共4页
会议地点 Singapore(SG)
作者
Zhaoyi Guo; Xingyuan Chen; Peng Jin; Si-Yuan Jing;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Character vectors; Continuous space language model; N-gram language model; Spelling errors detection;

机译：字符向量；连续空间语言模型； N-gram语言模型；拼写错误检测;

相似文献

外文文献
中文文献
专利

1. Chinese Spelling Error Detection Using a Fusion Lattice LSTM [J] . Wang Hao, Wang Bin, Duan Jianyong, ACM transactions on Asian and low-resource language information processing . 2021,第2期

机译：使用融合格子LSTM的拼写错误检测
2. Sentence Level N-Gram Context Feature in Real-Word Spelling Error Detection and Correction: Unsupervised Corpus Based Approach [J] . Tsegay Mullu Kassa Journal of Information Engineering and Applications . 2020,第4期

机译：句子级别n-gram上下文特征在实际单词拼写错误检测和校正中：基于无监督的语料库方法
3. Chinese Spelling Errors Detection Based on CSLM [C] . Zhaoyi Guo, Xingyuan Chen, Peng Jin, IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology . 2015

机译：基于CSLM的汉语拼写错误检测
4. A study of spelling errors in word processing: Detection and correction. [D] . Diaz-Figueroa, Maria I. 2007

机译：文字处理中的拼写错误研究：检测和更正。
5. Neural Bases of Unconscious Error Detection in a Chinese Anagram Solution Task: Evidence from ERP Study [O] . Hua-zhan Yin, Dan Li, Junyi- Yang, -1

机译：汉字解字任务中无意识错误检测的神经基础：ERP研究的证据
6. Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape [O] . Junjie Yu, Zhenghua Li 2015

机译：基于语言模型，语音和形状的汉语拼写错误检测与校正
7. SPEEDCOP: Automatic Spelling Error Detection and Correction for Large Data Bases [R] . Pollock, J. J. 1981

机译：spEEDCOp：大型数据库的自动拼写错误检测和纠正

Chinese Spelling Errors Detection Based on CSLM

摘要

著录项

相似文献

相关主题

期刊订阅