首页> 外文期刊>Software, practice & experience >A spark-based big data analysis framework for real-time sentiment prediction on streaming data
【24h】

A spark-based big data analysis framework for real-time sentiment prediction on streaming data

机译:一种基于火花的大数据分析框架,用于流数据的实时情绪预测

获取原文
获取原文并翻译 | 示例
           

摘要

There are many data sources that produce large volumes of data. The Big Data nature requires new distributed processing approaches to extract the valuable information. Real-time sentiment analysis is one of the most demanding research areas that requires powerful Big Data analytics tools such as Spark. Prior literature survey work has shown that, though there are many conventional sentiment analysis researches, there are only few works realizing sentiment analysis in real time. One major point that affects the quality of real-time sentiment analysis is the confidence of the generated data. In more clear terms, it is a valuable research question to determine whether the owner that generates sentiment is genuine or not. Since data generated by fake personalities may decrease accuracy of the outcome, a smart/intelligent service that can identify the source of data is one of the key points in the analysis. In this context, we include a fake account detection service to the proposed framework. Both sentiment analysis and fake account detection systems are trained and tested using Naive Bayes model from Apache Spark's machine learning library. The developed system consists of four integrated software components, ie, (i) machine learning and streaming service for sentiment prediction, (ii) a Twitter streaming service to retrieve tweets, (iii) a Twitter fake account detection service to assess the owner of the retrieved tweet, and (iv) a real-time reporting and dashboard component to visualize the results of sentiment analysis. The sentiment classification performances of the system for offline and real-time modes are 86.77% and 80.93%, respectively.
机译:有许多数据源产生大量数据。大数据性质需要新的分布式处理方法来提取有价值的信息。实时情感分析是最苛刻的研究领域之一,需要强大的大数据分析工具,如火花。先前的文献调查工作表明,尽管存在许多传统的情绪分析研究,但只有很少的作品实时地实现情绪分析。影响实时情感分析质量的一个主要观点是生成数据的置信度。在更明确的条款中,它是一个有价值的研究问题,用于确定产生情绪的所有者是否是真实的。由于虚假人物生成的数据可能会降低结果的准确性,因此可以识别数据源的智能/智能服务是分析中的关键点之一。在这种情况下,我们将假帐户检测服务包括到所提出的框架。来自Apache Spark的机器学习库的天真贝叶斯模型,培训和测试的情感分析和假账户检测系统。开发系统由四个集成软件组件组成,即(i)机器学习和情感预测流服务,(ii)Twitter流服务来检索推文,(iii)Twitter虚假帐户检测服务以评估其所有者检索Tweet,(iv)实时报告和仪表板组件,以可视化情绪分析结果。用于离线和实时模式的系统的情感分类性能分别为86.77%和80.93%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号