首页> 外文期刊>Information retrieval >A machine learning approach to sentiment analysis in multilingual Web texts
【24h】

A machine learning approach to sentiment analysis in multilingual Web texts

机译:一种多语言Web文本中情感分析的机器学习方法

获取原文
获取原文并翻译 | 示例
           

摘要

Sentiment analysis, also called opinion mining, is a form of information extraction from text of growing research and commercial interest. In this paper we present our machine learning experiments with regard to sentiment analysis in blog, review and forum texts found on the World Wide Web and written in English, Dutch and French. We train from a set of example sentences or statements that are manually annotated as positive, negative or neutral with regard to a certain entity. We are interested in the feelings that people express with regard to certain consumption products. We learn and evaluate several classification models that can be configured in a cascaded pipeline. We have to deal with several problems, being the noisy character of the input texts, the attribution of the sentiment to a particular entity and the small size of the training set. We succeed to identify positive, negative and neutral feelings to the entity under consideration with ca. 83% accuracy for English texts based on unigram features augmented with linguistic features. The accuracy results of processing the Dutch and French texts are ca. 70 and 68% respectively due to the larger variety of the linguistic expressions that more often diverge from standard language, thus demanding more training patterns. In addition, our experiments give us insights into the portability of the learned models across domains and languages. A substantial part of the article investigates the role of active learning techniques for reducing the number of examples to be manually annotated.
机译:情感分析(也称为观点挖掘)是从不断增长的研究和商业兴趣的文本中提取信息的一种形式。在本文中,我们以博客,评论和论坛文章的形式在互联网上以英语,荷兰语和法语编写的情感分析中展示了我们的机器学习实验。我们从一组示例语句或语句中进行训练,这些示例语句或语句相对于某个实体手动注释为肯定,否定或中立。我们对人们对某些消费产品表达的感觉很感兴趣。我们学习和评估可以在级联管道中配置的几种分类模型。我们必须处理几个问题,例如输入文本的嘈杂特征,将情感归因于特定实体以及训练集的规模小。我们成功地通过ca.识别了所考虑实体的积极,消极和中立的感觉。基于增加了语言功能的字母组合特征的英语文本,准确性为83%。处理荷兰语和法语文本的准确性结果大约为。分别由于70%和68%的语言表达更多地与标准语言不同,因此需要更多的训练模式。此外,我们的实验使我们对跨领域和跨语言的学习模型的可移植性有了更深入的了解。本文的大部分内容探讨了主动学习技术在减少要手动注释的示例数量方面的作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号