On Scalable and Robust Truth Discovery in Big Data Social Media Sensing Applications

Daniel Zhang; Dong Wang; Nathan Vance; Yang Zhang; Steven Mike

首页> 外文期刊>Big Data, IEEE Transactions on >On Scalable and Robust Truth Discovery in Big Data Social Media Sensing Applications

【24h】

On Scalable and Robust Truth Discovery in Big Data Social Media Sensing Applications

机译：大数据社交媒体感知应用中的可扩展且稳健的真相发现

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Identifying trustworthy information in the presence of noisy data contributed by numerous unvetted sources from online social media (e.g., Twitter, Facebook, and Instagram) has been a crucial task in the era of big data. This task, referred to as truth discovery, targets at identifying the reliability of the sources and the truthfulness of claims they make without knowing either a priori. In this work, we identified three important challenges that have not been well addressed in the current truth discovery literature. The first one is “misinformation spread” where a significant number of sources are contributing to false claims, making the identification of truthful claims difficult. For example, on Twitter, rumors, scams, and influence bots are common examples of sources colluding, either intentionally or unintentionally, to spread misinformation and obscure the truth. The second challenge is “data sparsity” or the “long-tail phenomenon” where a majority of sources only contribute a small number of claims, providing insufficient evidence to determine those sources' trustworthiness. For example, in the Twitter datasets that we collected during real-world events, more than 90 percent of sources only contributed to a single claim. Third, many current solutions are not scalable to large-scale social sensing events because of the centralized nature of their truth discovery algorithms. In this paper, we develop a Scalable and Robust Truth Discovery (SRTD) scheme to address the above three challenges. In particular, the SRTD scheme jointly quantifies both the reliability of sources and the credibility of claims using a principled approach. We further develop a distributed framework to implement the proposed truth discovery scheme using Work Queue in an HTCondor system. The evaluation results on three real-world datasets show that the SRTD scheme significantly outperforms the state-of-the-art truth discovery methods in terms of both effectiveness and efficiency.

机译：在大数据时代，在线社交媒体（例如Twitter，Facebook和Instagram）众多未经审查的来源提供的嘈杂数据的存在下，识别可信赖的信息一直是至关重要的任务。这项任务称为真相发现，旨在识别来源的可靠性和所提出主张的真实性，而无需先验。在这项工作中，我们确定了三个重要的挑战，这些挑战在当前的真相发现文献中并未得到很好的解决。第一个是“错误信息传播”，其中大量来源造成了虚假主张，从而难以确定真实的主张。例如，在Twitter上，谣言，诈骗和影响力机器人是有意或无意串通以散布错误信息并掩盖真相的常见来源。第二个挑战是“数据稀疏性”或“长尾现象”，其中大多数来源仅提出少量索赔，提供的证据不足以确定这些来源的可信度。例如，在我们在实际事件中收集的Twitter数据集中，超过90％的来源仅对一个声明有贡献。第三，由于其真相发现算法的集中性，许多当前解决方案无法扩展到大规模的社会感知事件。在本文中，我们开发了一种可扩展且稳健的真相发现（SRTD）方案来解决上述三个挑战。尤其是，SRTD方案使用原则性方法共同量化了来源的可靠性和索赔的可信度。我们进一步开发了一个分布式框架，以在HTCondor系统中使用“工作队列”来实现建议的真相发现方案。在三个真实数据集上的评估结果表明，SRTD方案在有效性和效率方面均明显优于最新的真相发现方法。

著录项

来源
《Big Data, IEEE Transactions on》 |2019年第2期|195-208|共14页
作者
Daniel Zhang; Dong Wang; Nathan Vance; Yang Zhang; Steven Mike;
展开▼
作者单位

Department of Computer Science and Engineering University of Notre Dame Notre Dame IN;

Department of Computer Science and Engineering University of Notre Dame Not;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Sensors; Big Data; Robustness; Twitter; Task analysis;

机译：传感器;大数据;坚固性推特;任务分析;

相似文献

外文文献
中文文献
专利

1. The Open SESMO (Search Engine & Social Media Optimization) Project: Linked and Structured Data for Library Subscription Databases to Enable Web-scale Discovery in Search Engines [J] . Jason A. Clark, Doralyn Rossmann Journal of web librarianship . 2017,第3a4期

机译：开放式SESMO（搜索引擎和社交媒体优化）项目：图书馆订阅数据库的链接和结构化数据，以在搜索引擎中实现Web规模的发现
2. Mapping fine‐scale urban housing prices by fusing remotely sensed imagery and social media data [J] . Yao Yao, Zhang Jinbao, Hong Ye, Transactions in GIS: TG . 2018,第2期

机译：通过融合远程感知的图像和社交媒体数据来绘制精细的城市房价
3. An efficient and privacy-preserving truth discovery scheme in crowdsensing applications [J] . Chuan Zhang, Chang Xu, Liehuang Zhu, Computers & Security . 2020,第Octa期

机译：众包应用中有效和隐私保留真理发现方案
4. On robust truth discovery in sparse social media sensing [C] . Daniel Yue Zhang, Rungang Han, Dong Wang, IEEE International Congress on Big Data . 2016

机译：稀疏社交媒体感知中可靠的真理发现
5. Developing a Data Mining Framework to Identify a Sense of Gentrification through Social Media Data: A Case Study Using Instagram Posts in Salt Lake City, Utah [D] . Huang, Cheng-Chia. 2017

机译：开发数据挖掘框架以通过社交媒体数据识别绅士主义感：以犹他州盐湖城的Instagram帖子为例的研究
6. Incentivizing for Truth Discovery in Edge-assisted Large-scale Mobile Crowdsensing [O] . Jia Xu, Shangshu Yang, Weifeng Lu, 2020

机译：鼓励在边缘辅助的大规模移动人群中进行真相发现

On Scalable and Robust Truth Discovery in Big Data Social Media Sensing Applications

摘要

著录项

相似文献

相关主题

期刊订阅