首页> 外文会议>International Conference on Asian Language Processing >Statistical Machine Translation Approach for Lexical Normalization on Indonesian Text
【24h】

Statistical Machine Translation Approach for Lexical Normalization on Indonesian Text

机译:印度尼西亚文本词汇标准化的统计机器翻译方法

获取原文

摘要

Lexical normalization is an important task to be performed on noisy data, such as social media posts, before using the data for further analysis. We examine the potential of Statistical Machine Translation (SMT) for normalization of Indonesian text using the translation unit on both phrase and character levels. We also used an external corpus to generate additional language model data and pre-normalization rules to enhance the SMT system. The result shows the SMT systems on both phrase and character levels are outperforming various baseline in Word Error Rate (WER) score and Bilingual Understudy Evaluation (BLEU) score. This research also demonstrates the effect of using an external language model and applying pre-normalization rules can further enhance the effectiveness of SMT systems in normalizing Indonesian text.
机译:词汇标准化是在使用数据进行进一步分析之前,在嘈杂数据(例如社交媒体帖)上进行的重要任务。我们研究了在短语和字符级别上使用翻译单元进行统计机器翻译(SMT)的统计信息标准化。我们还使用外部语料库来生成额外的语言模型数据和预归一化规则,以增强SMT系统。结果表明,两个短语和字符级别上的SMT系统在错误的错误率(WER)得分和双语升级评估(BLEU)得分中表现出各种基线。本研究还展示了使用外部语言模型的效果,并应用预先规范规则可以进一步提高SMT系统在标准化印尼文本中的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号