【24h】

An In-depth Analysis of the Effect of Text Normalization in Social Media

机译:社交媒体中文本规范化效果的深入分析

获取原文
获取外文期刊封面目录资料

摘要

Recent years have seen increased interest in text normalization in social media, as the informal writing styles found in Twitter and other social media data often cause problems for NLP applications. Unfortunately, most current approaches narrowly regard the normalization task as a "one size fits all" task of replacing non-standard words with their standard counterparts. In this work we build a taxonomy of normalization edits and present a study of normalization to examine its effect on three different downstream applications (dependency parsing, named entity recognition, and text-to-speech synthesis). The results suggest that how the normalization task should be viewed is highly dependent on the targeted application. The results also show that normalization must be thought of as more than word replacement in order to produce results comparable to those seen on clean text.
机译:近年来,人们对社交媒体中的文本规范化越来越感兴趣,因为在Twitter和其他社交媒体数据中发现的非正式写作风格通常会给NLP应用程序带来问题。不幸的是,大多数当前方法狭义地将规范化任务视为用其标准对等词替换非标准单词的“一刀切”的任务。在这项工作中,我们建立规范化编辑的分类法,并提出规范化研究,以检查规范化对三种不同下游应用程序的影响(依赖性解析,命名实体识别和文本到语音合成)。结果表明,应该如何看待标准化任务在很大程度上取决于目标应用程序。结果还表明,为了产生与纯文本相同的结果,必须将规范化视为不仅仅是单词替换。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号