The increased adoption of online social networks such as Twitter has led to a deluge of available information. This brings about the need for methods to quickly identify and extract useful, credible information from large amounts of noisy data. We first show the challenges in defining credibility in the case of information in social media. Then, we develop supervised machine learning methods to extract credible information. We also define reasonable and meaningful credibility ground truth measures. To accomplish this, we deconstruct credibility and study the specific constructs that signal credibility individually. We then conduct a crowdsourced survey to collect ground truth credibility assessments. We find that surveys yield measurements that are often noisy and hard to work with. On Twitter, retweets are a form of endorsement by the users on Twitter and are a noisy in-network measure of credibility. We show that combining these measures yields ground truth measures where both sets of users agree on the credibility of a message. We find that models trained on these labeling schemes are able to identify more useful messages and achieve higher accuracy over models trained to predict the individual noisy ground truth values.;A related task is that of identifying what pieces of information published on the social network are true. One approach to solve this problem treats humans on the social network as sensors with unknown reliability who sense the state of the world and report their observations as claims by publishing messages. Fact finding algorithms use an unsupervised estimation theoretic approach to jointly estimate the truthfulness of claims and the reliability of the human sensors that make the claims given some prior beliefs. However, due to the sparseness of information available in Twitter streaming data, these algorithms have very little information to update the prior beliefs for claims corroborated by very few sources. We find that using simple heuristics in developing fusion methods to use the credibility predictions yields improvements in performance over the estimates reached by the fact finder alone.
展开▼