首页> 外文会议>Language and Technology Conference >Spanish Diacritic Error Detection and Restoration - A Survey
【24h】

Spanish Diacritic Error Detection and Restoration - A Survey

机译:西班牙读书错误检测与恢复 - 调查

获取原文

摘要

In this paper we address the problem of diacritic error detection and restoration - the task of identifying and correcting missing accents in text. In particular, we evaluate the performance of a simple part-of-speech tagger-based technique comparing it to other established methods for error detection/restoration: unigram frequency, decision lists, discriminative classifiers, a machine-translation based method, and grapheme-based approaches. In languages such as Spanish (the focus here), diacritics play a key role in disambiguation and results show that a straightforward modification to an n-gram tagger can be used to achieve good performance in diacritic error identification without resorting to any specialized machinery. Our method should be applicable to any language where diacritics distribute comparably and perform similar roles of disambiguation.
机译:在本文中,我们解决了读音器错误检测和恢复问题 - 识别和纠正文本中缺失的折叠的任务。特别是,我们评估了一种简单的语音标记的技术的性能,将其与其他既定的错误检测/恢复方法进行比较:unigram频率,决定列表,鉴别类别分类器,基于机器的方法和图形 - 基于方法。在西班牙语(这里的重点)等语言中,模糊物在消歧和结果中发挥着关键作用,结果表明,对N-GRAM标记器的直接修改可用于在无读数误差识别中实现良好的性能,而无需诉诸任何专门的机械。我们的方法应适用于任何形式的任何语言,其中复杂的分配相当并执行歧义的类似角色。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号