首页> 外文会议>National Information Technology Conference >A Sinhala and Tamil Extension to Generic Environment for Context-aware Correction
【24h】

A Sinhala and Tamil Extension to Generic Environment for Context-aware Correction

机译:Sinhala和泰米尔语的通用环境的上下文感知校正的扩展

获取原文

摘要

There are several types of research available on spell checkers for European languages and Indian languages. However, low resourced languages like Tamil & Sinhala have limited research in this problem space, maybe, because of its highly inflectional and morphologically rich nature. There is no fully functional context-aware spell-checking system, especially as an open source. A Generic Environment for context-aware spell correction approach is extended for resource-scarce languages: Sinhala and Tamil in this paper. Experimental results show that our system detects the error in spelling well and provides the most suitable suggestions for correcting the misspelled words with a minimum of 85% accuracy for Tamil and 70% for the Sinhala Language. This is the first ever context-aware spell corrector for the Sinhala language. Compared to prior Tamil language context-aware spell correctors this leaps in 1) modularized architecture and 2) increased coverage and accuracy. Moreover, this study produced a Tamil and Sinhala spell correction benchmark dataset. Both the dataset and the tools are available for public use.
机译:欧洲语言和印度语言的拼写检查有几种类型的研究。然而,像泰米尔和僧伽罗一样的低资源语言在这个问题空间中具有有限的研究,也许是因为它具有高度折射和形态的性质。没有完全功能的上下文感知拼写检查系统,尤其是作为开源。为资源稀缺语言扩展了上下文校正法术校正方法的通用环境:本文中的Sinhala和Tamil。实验结果表明,我们的系统检测到拼写良好的错误,并提供最适合纠正拼错单词的最适当的建议,以泰米尔最低为85%的准确度,70%用于僧伽纳语。这是Sinhala语言的第一个上下文感知法术校正器。与先前的泰米尔语言情境感知法术校正相比,这跳出了1)模块化架构和2)增加了覆盖率和准确性。此外,这项研究制作了泰米尔和僧伽罗法术校正基准数据集。数据集和工具都可用于公共使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号