首页> 外文会议>IEEE International Conference on Machine Learning and Applications >A Novel Approach to Big Data Veracity Using Crowdsourcing Techniques and Bayesian Predictors
【24h】

A Novel Approach to Big Data Veracity Using Crowdsourcing Techniques and Bayesian Predictors

机译:利用众包技术和贝叶斯预测因子对大数据准确性的一种新方法

获取原文

摘要

In today's world data is being generated at a tremendous pace and there have to be enough measures in place to verify the nature of big data. Analysis performed on 'dirty' data may lead to erroneous insights and thereby shaping decisions poorly. The aspect of big data that deals with its correctness is known as big data veracity. Trusting the data acquired goes a long way in implementing decisions from an automated decision-making system and veracity helps to validate the data acquired. In this paper, we present our solution to the big data veracity problem using crowdsourcing techniques. Our solution involves the use of sentiment analysis, which deals with identifying the sentiment expressed in a piece of text. As a proof of concept, we have developed an app that requires users to tag tweets as per the sentiment it evokes in them. Each tweet would therefore get ratified by hundreds of our participants and the sentiment associated to the tweet gets tagged. The tagged emotion was then evaluated against the verified emotion as compared to a verified data set. This analysis was then plotted on a ROC curve and also evaluated against verified data using a Bayesian predictor trained with a trinomial function. As can be seen, an accuracy of 81% was obtained as displayed by the ROC curve and 89% through the Bayesian predictor. Also, a MAP analysis of the Bayesian predictor yields neutral sentiment as the most probable hypothesis. By doing this, we have proven that crowdsourcing of sentiment analysis is a viable solution to the problem of big data veracity and therefore an aid in making better decisions.
机译:在今天的世界数据中正在以巨大的速度产生,并且必须有足够的措施来验证大数据的性质。对“脏”数据进行的分析可能导致错误的见解,从而塑造决策不佳。处理其正确性的大数据的方面被称为大数据准则。相信所获取的数据在实现自动决策系统和验证所获取的数据的情况下实现了很长的路要走。在本文中,我们使用众包技术向大数据准确性问题提供了我们的解决方案。我们的解决方案涉及使用情绪分析,该语言涉及识别在一段文本中表达的情绪。作为一个概念证明,我们开发了一个应用程序,要求用户按照它唤起的情绪标记推文。因此,每个推文都将被数百人批准批准,并将与Tweet相关的情绪标记。与验证的数据集相比,然后根据验证的情绪评估标记的情绪。然后在ROC曲线上绘制该分析,并且还使用具有三种函数训练的贝叶斯预测器来评估验证数据。可以看出,获得81%的准确度,通过ROC曲线显示,89%通过贝叶斯预测器。此外,贝叶斯预测器的地图分析产生中性情绪作为最可能的假设。通过这样做,我们已经证明了情感分析的众所周境是对大数据准确性问题的可行解决方案,因此有助于提高决策。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号