...
首页> 外文期刊>ACM transactions on intelligent systems >Multimodular Text Normalization of Dutch User-Generated Content
【24h】

Multimodular Text Normalization of Dutch User-Generated Content

机译:荷兰用户生成内容的多模块文本规范化

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

As social media constitutes a valuable source for data analysis for a wide range of applications, the need for handling such data arises. However, the nonstandard language used on social media poses problems for natural language processing (NLP) tools, as these are typically trained on standard language material. We propose a text normalization approach to tackle this problem. More specifically, we investigate the usefulness of a multimodular approach to account for the diversity of normalization issues encountered in user-generated content (UGC). We consider three different types of UGC written in Dutch (SNS, SMS, and tweets) and provide a detailed analysis of the performance of the different modules and the overall system. We also apply an extrinsic evaluation by evaluating the performance of a part-of-speech tagger, lemmatizer, and named-entity recognizer before and after normalization.
机译:由于社交媒体构成了用于广泛应用程序的数据分析的宝贵资源,因此需要处理此类数据。但是,社交媒体上使用的非标准语言给自然语言处理(NLP)工具带来了问题,因为这些工具通常是在标准语言材料上训练的。我们提出了一种文本规范化方法来解决此问题。更具体地说,我们研究了一种多模块方法来解决用户生成的内容(UGC)中遇到的标准化问题的多样性的有用性。我们考虑了用荷兰语编写的三种不同类型的UGC(SNS,SMS和tweet),并提供了对不同模块和整个系统性能的详细分析。我们还通过评估归一化前后的词性标记器,lemmatizer和命名实体识别器的性能来应用外部评估。

著录项

  • 来源
    《ACM transactions on intelligent systems》 |2016年第4期|61.1-61.22|共22页
  • 作者单位

    Univ Ghent, Dept Translat Interpreting & Commun, Groot Brittannielaan 45, B-9000 Ghent, Belgium|Univ Stuttgart, Inst Nat Language Proc, Pfaffenwaldring 5B, D-70569 Stuttgart, Germany;

    Univ Antwerp, Computat Linguist & Psycholinguist Res Ctr, Prinsstr 13, B-2000 Antwerp, Belgium;

    Univ Ghent, Dept Translat Interpreting & Commun, Groot Brittannielaan 45, B-9000 Ghent, Belgium;

    Univ Ghent, Dept Translat Interpreting & Commun, Groot Brittannielaan 45, B-9000 Ghent, Belgium;

    Univ Ghent, Dept Translat Interpreting & Commun, Groot Brittannielaan 45, B-9000 Ghent, Belgium;

    Univ Antwerp, Computat Linguist & Psycholinguist Res Ctr, Prinsstr 13, B-2000 Antwerp, Belgium;

    Univ Ghent, Dept Translat Interpreting & Commun, Groot Brittannielaan 45, B-9000 Ghent, Belgium;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Social media; text normalization; user-generated content;

    机译:社交媒体;文本规范化;用户生成的内容;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号