首页> 外文会议>International Conference on Communication and Signal Processing >Introduction and Correction of Bengali-Hindi Noise in Large Word Vocabulary using RNN
【24h】

Introduction and Correction of Bengali-Hindi Noise in Large Word Vocabulary using RNN

机译:使用RNN引入和纠正大单词词汇中的孟加拉语-印地语噪声

获取原文

摘要

Word correction or Spell Checking is essential for fundamental applications such as text editors and social media based chat platforms. A lot of works have been reported in English, Chinese, and other European languages with available standard datasets. However, there is a lack of sizeable standard datasets with Indic languages, which is one of the causes not to observe enough contribution in this field. As the possibility of error in a word for an Indic script is different than that of English due to the presence of diacritics(matra), the synthetic noise introduction should be very specific. In the work, we focus on creating two large standard noisy datasets using the probabilistic rule on alphabets for Hindi and Bengali languages. Each dataset contains a vocabulary size of 150000 words. We tested the datasets using two recurrent neural networks (RNN) based models which provide 80% accuracy on both the datasets.
机译:单词校正或拼写检查对于诸如文本编辑器和基于社交媒体的聊天平台之类的基本应用程序至关重要。已经用英语,中文和其他欧洲语言报道了许多作品,并提供了可用的标准数据集。但是,缺乏使用印度语的可伸缩标准数据集,这是未能在该领域做出足够贡献的原因之一。由于变音符号(matra)的存在,印度文字的单词错误的可能性不同于英语,因此合成噪声的引入应该非常具体。在工作中,我们专注于使用针对印地语和孟加拉语的字母概率规则来创建两个大型标准噪声数据集。每个数据集包含150000个单词的词汇量。我们使用两个基于递归神经网络(RNN)的模型测试了数据集,这两个数据集都提供了80%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号