首页> 外文会议>International Workshop on Computational Processing of the Portuguese Language >Evaluating Phonetic Spellers for User-Generated Content in Brazilian Portuguese
【24h】

Evaluating Phonetic Spellers for User-Generated Content in Brazilian Portuguese

机译:评估巴西葡萄牙语中的用户生成内容的语音拼写

获取原文

摘要

Recently, spell checking (or spelling correction systems) has regained attention due to the need of normalizing user-generated content (UGC) on the web. UGC presents new challenges to spellers, as its register is much more informal and contains much more variability than traditional spelling correction systems can handle. This paper proposes two new approaches to deal with spelling correction of UGC in Brazilian Portuguese (BP), both of which take into account phonetic errors. The first approach is based on three phonetic modules running in a pipeline. The second one is based on machine learning, with soft decision making, and considers context-sensitive misspellings. We compared our methods with others on a human annotated UGC corpus of reviews of products. The machine learning approach surpassed all other methods, with 78.0 % correction rate, very low false positive (0.7 %) and false negative rate (21.9 %).
机译:最近,拼写检查(或拼写校正系统)由于需要在Web上规范化用户生成的内容(UGC)而恢复了注意力。 UGC对拼写呈现出新的挑战,因为它的寄存器更加非正式,并且含有比传统拼写校正系统可以处理的更多可变性。本文提出了两种新方法来处理巴西葡萄牙语(BP)的UGC拼写修正,两者都考虑到语音误差。第一种方法基于在管道中运行的三个语音模块。第二个是基于机器学习,具有软决策,并考虑上下文敏感的拼写错误。我们将我们的方法与其他人的批评UGC审查有关的方法进行了比较。机器学习方法超越了所有其他方法,校正率78.0%,误报(0.7%)和假负率(21.9%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号