...
首页> 外文期刊>Big Data, IEEE Transactions on >On Scalable and Robust Truth Discovery in Big Data Social Media Sensing Applications
【24h】

On Scalable and Robust Truth Discovery in Big Data Social Media Sensing Applications

机译:大数据社交媒体感知应用中的可扩展且稳健的真相发现

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Identifying trustworthy information in the presence of noisy data contributed by numerous unvetted sources from online social media (e.g., Twitter, Facebook, and Instagram) has been a crucial task in the era of big data. This task, referred to as truth discovery, targets at identifying the reliability of the sources and the truthfulness of claims they make without knowing either a priori. In this work, we identified three important challenges that have not been well addressed in the current truth discovery literature. The first one is “misinformation spread” where a significant number of sources are contributing to false claims, making the identification of truthful claims difficult. For example, on Twitter, rumors, scams, and influence bots are common examples of sources colluding, either intentionally or unintentionally, to spread misinformation and obscure the truth. The second challenge is “data sparsity” or the “long-tail phenomenon” where a majority of sources only contribute a small number of claims, providing insufficient evidence to determine those sources' trustworthiness. For example, in the Twitter datasets that we collected during real-world events, more than 90 percent of sources only contributed to a single claim. Third, many current solutions are not scalable to large-scale social sensing events because of the centralized nature of their truth discovery algorithms. In this paper, we develop a Scalable and Robust Truth Discovery (SRTD) scheme to address the above three challenges. In particular, the SRTD scheme jointly quantifies both the reliability of sources and the credibility of claims using a principled approach. We further develop a distributed framework to implement the proposed truth discovery scheme using Work Queue in an HTCondor system. The evaluation results on three real-world datasets show that the SRTD scheme significantly outperforms the state-of-the-art truth discovery methods in terms of both effectiveness and efficiency.
机译:在大数据时代,在线社交媒体(例如Twitter,Facebook和Instagram)众多未经审查的来源提供的嘈杂数据的存在下,识别可信赖的信息一直是至关重要的任务。这项任务称为真相发现,旨在识别来源的可靠性和所提出主张的真实性,而无需先验。在这项工作中,我们确定了三个重要的挑战,这些挑战在当前的真相发现文献中并未得到很好的解决。第一个是“错误信息传播”,其中大量来源造成了虚假主张,从而难以确定真实的主张。例如,在Twitter上,谣言,诈骗和影响力机器人是有意或无意串通以散布错误信息并掩盖真相的常见来源。第二个挑战是“数据稀疏性”或“长尾现象”,其中大多数来源仅提出少量索赔,提供的证据不足以确定这些来源的可信度。例如,在我们在实际事件中收集的Twitter数据集中,超过90%的来源仅对一个声明有贡献。第三,由于其真相发现算法的集中性,许多当前解决方案无法扩展到大规模的社会感知事件。在本文中,我们开发了一种可扩展且稳健的真相发现(SRTD)方案来解决上述三个挑战。尤其是,SRTD方案使用原则性方法共同量化了来源的可靠性和索赔的可信度。我们进一步开发了一个分布式框架,以在HTCondor系统中使用“工作队列”来实现建议的真相发现方案。在三个真实数据集上的评估结果表明,SRTD方案在有效性和效率方面均明显优于最新的真相发现方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号