首页> 外文期刊>Expert systems with applications >Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets
【24h】

Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets

机译:多语言评价推文的伯特基情感分析的预处理

获取原文
获取原文并翻译 | 示例

摘要

Social media offer a big amount of information, to exploit in many fields of research. However, while methods for Natural Language Processing are being developed with good results when applied to well-formed datasets made of written text with a clear syntax, these sources present text written in informal language, unstructured syntax, and with peculiar symbols; therefore, particular approaches are required for text processing in this case. In this paper, the task of sentiment analysis of tweets is regarded. In particular, in order to avoid noise constituted by some web constructs like URLs and mentions and by other text fragments, and to exploit information hidden in symbols like emoticons, emojis and hashtags, the pre-processing of tweets is analyzed. More in detail, a number of experiments, performed by a state-of-the-art classification model (BERT), are designed, to evaluate many currently available operations for pre-processing tweets, in terms of the statistical significance of their influence on sentiment analysis performances. Moreover, available data in two languages are considered, i.e., English and Italian, in order to also evaluate dependence on the language. Results allow to individuate the most convenient strategy to pre-process tweets, and thus to improve the state of the art in both languages for the considered task of sentiment analysis.
机译:社交媒体提供大量信息,在许多研究领域中利用。但是,当应用于具有清晰语法的书面文本的良好的数据集时正在开发出具有良好结果的自然语言处理的方法,但是这些源以非正式语言,非结构化语法以及特殊符号编写的文本;因此,在这种情况下,文本处理需要特定方法。在本文中,尊重推文的情感分析任务。特别是为了避免由某些Web构造等URL和提到的噪声以及由其他文本碎片构成的噪声,并且在表情符号,表情象征和哈希特等符号中隐藏的信息,分析了推文的预处理。更详细地,设计了通过最先进的分类模型(BERT)进行的许多实验,以评估许多目前可用的操作,以便预处理推文,就其影响的统计学意义而言情感分析表演。此外,考虑了两种语言的可用数据,即英语和意大利语,以评估对语言的依赖。结果允许为预流程推文的最方便的策略,从而为两种语言提高本领域,以考虑情绪分析的任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号