首页> 外文会议>CCF International Conference on Natural Language Processing and Chinese Computing >Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks
【24h】

Automatic Proofreading in Chinese: Detect and Correct Spelling Errors in Character-Level with Deep Neural Networks

机译:中文自动校对:使用深度神经网络检测并纠正字符级拼写错误

获取原文

摘要

Rapid increase of the scale of text carries huge costs for manual proofreading. In comparison, automatic proofreading shows great advantages on time and human resource, drawing more researchers into it. In this paper, we propose two attention based deep neural network models combined with confusion sets to detect and correct possible Chinese spelling errors in character-level. Our proposed approaches first model the context of Chinese character embedding using Long Short-Term Memory (LSTM) networks, then score the probabilities of candidates from its confusion set through attention mechanism, choosing the highest one as the prediction answer. Also, we define a new methodology for obtaining (preceding text, following text, candidates, target) quads and provides a supervised dataset for training and testing (Our data has been released to the public in https://github.com/ccit-proofread.). Performance evaluation indicates that our models achieve the state-of-the-art performance and outperform a set of baselines.
机译:文本规模的迅速增加为人工校对带来了巨大的成本。相比之下,自动校对在时间和人力资源上显示出巨大的优势,吸引了更多的研究者。在本文中,我们提出了两种基于注意力的深度神经网络模型,并结合了混淆集来检测和纠正字符级别的可能中文拼写错误。我们提出的方法首先使用长短期记忆(LSTM)网络对汉字嵌入的上下文进行建模,然后通过注意力机制从候选者的混淆集中对候选者的概率进行评分,选择最高者作为预测答案。此外,我们定义了一种用于获取(前文本,后文本,候选对象,目标)四边形的新方法,并提供了用于训练和测试的受监督数据集(我们的数据已在https://github.com/ccit-上公开发布。校对。)。绩效评估表明,我们的模型达到了最先进的绩效,并且优于一组基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号