首页> 外文学位 >Multi-sourced Information Trustworthiness Analysis: Applications and Theory
【24h】

Multi-sourced Information Trustworthiness Analysis: Applications and Theory

机译:多源信息可信度分析:应用与理论

获取原文
获取原文并翻译 | 示例

摘要

In the era of Big Data, data entries, even describing the same objects or events, can come from a variety of sources. There are some sources that typically provide accurate information, but due to various reasons such as recording errors, device malfunction, background noise and intent to manipulate the data, some other sources may contain noisy or even erroneous information. Therefore, it is inevitable that information from multiple sources is conflicting with each other. To discover useful knowledge, which is usually deeply buried in those complicate multi-sourced data, we have to conduct information trustworthiness analysis on all available data sources. In this thesis, we propose a series of approaches of multi-sourced information trustworthiness analysis, including reliability-aware information integration and inconsistency detection to efficiently and effectively discover both trustworthy and untrustworthy information, respectively.;In reliability-aware information integration, it is critical to identify reliable sources that more often provide accurate information, so we can pay more attention on their information to better discover the truths (i.e., trustworthy information). Unfortunately, there is no oracle telling us which information source is more reliable a priori. To correctly identify the truths, in Part I of this thesis, we develop novel information integration methods that incorporate the estimation of source reliability. We explore the power of source reliability estimation in both data-level and model-level information. The objective is to jointly estimate which source is reliable and which piece of information is correct, where the information could be the raw data in data-level information integration or the model parameter in model-level information integration. In this part, we proved some nice properties of the proposed approaches via theoretical analysis and demonstrated their impacts on some real applications, such as indoor floorplan construction and crowdsourced question answering.;On the other hand, when unexpected disagreement is encountered across diverse information sources, i.e. data entities receive inconsistent information across multiple data sources, this might raise a red flag and require in-depth investigation. The Part II of my thesis research is to conduct inconsistency detection among multiple information sources to detect anomalies. We develop a series of tensor decomposition based algorithms for detecting inconsistent information in an unsupervised learning setting. In unsupervised learning, by representing dynamic multi-sourced data as tensors, we proposed different tensor decomposition based approaches, including an online method with theoretical guarantees for large-scale applications, to capture the common patterns across sources. An indicator of anomaly is proposed by identifying inconsistencies based on a comparison between source inputs and common patterns. The proposed frameworks have further been applied to a wide variety of applications from cybersecurity, to hotel review, and to computer networks.;To sum up, we conduct novel multi-sourced information trustworthiness analysis to discover trustworthy information or to detect untrustworthy information in this thesis. For trustworthy information discovery, the proposed reliability-aware Information Integration framework gives us a tool to identify reliable sources and discover the true information of data entities from the conflicting multi-sourced data. For untrustworthy information detection, we can detect malicious data entities which receive inconsistent information across all available data sources via the developed Inconsistency Detection approaches. The frameworks we developed have been effectively applied in many areas, including Hotel Review Analysis, Cybersecurity, and Computer Network, and have the potential of being applied to many other areas, such as Healthcare, Mobilesensing, and Crowdsourcing. With advances in technology and devices, both the amount of data and the number of sources in our world are still exploding, so there are great opportunities as well as numerous research challenges for inference of useful knowledge from multiple sources of massive data collections.
机译:在大数据时代,即使描述相同的对象或事件的数据条目也可能来自多种来源。有一些来源通常会提供准确的信息,但是由于各种原因,例如记录错误,设备故障,背景噪音以及意图操纵数据,其他一些来源可能包含嘈杂甚至错误的信息。因此,不可避免的是,来自多个来源的信息相互冲突。为了发现有用的知识,通常这些知识通常深埋在那些复杂的多源数据中,我们必须对所有可用数据源进行信息可信度分析。本文提出了一系列的多源信息可信度分析方法,包括可靠性感知信息集成和不一致性检测,以分别有效,有效地发现可信和不可信信息。这对于确定经常提供准确信息的可靠来源至关重要,因此我们可以更加关注它们的信息,以便更好地发现真相(即,可信赖的信息)。不幸的是,没有先知告诉我们哪个信息源更可靠。为了正确识别事实,在本文的第一部分中,我们开发了新颖的信息集成方法,该方法结合了对源可靠性的估计。我们在数据级和模型级信息中探索源可靠性估计的功能。目的是共同估计哪个来源可靠,哪些信息正确,其中信息可以是数据级信息集成中的原始数据,也可以是模型级信息集成中的模型参数。在这一部分中,我们通过理论分析证明了所提出方法的一些良好特性,并证明了它们对某些实际应用的影响,例如室内平面图构建和众包问答。另一方面,当跨各种信息源遇到意外分歧时,即数据实体在多个数据源之间接收到不一致的信息,这可能会引起危险,并需要进行深入调查。本文研究的第二部分是在多个信息源之间进行不一致性检测,以发现异常。我们开发了一系列基于张量分解的算法,用于在无监督的学习环境中检测不一致的信息。在无监督学习中,通过将动态的多源数据表示为张量,我们提出了不同的基于张量分解的方法,其中包括一种在线方法,该方法为大规模应用提供了理论上的保证,以捕获跨源的通用模式。通过基于源输入和通用模式之间的比较来识别不一致之处,提出了异常指标。拟议的框架已进一步应用于从网络安全,酒店点评到计算机网络的广泛应用。综上所述,我们进行了新颖的多源信息可信度分析,以发现可信信息或检测不可信信息。论文。对于可信赖的信息发现,所提出的可靠性感知信息集成框架为我们提供了一种工具,用于识别可靠的源并从冲突的多源数据中发现数据实体的真实信息。对于不可靠的信息检测,我们可以通过开发的“不一致检测”方法检测在所有可用数据源中接收不一致信息的恶意数据实体。我们开发的框架已在许多领域得到有效应用,包括酒店评论分析,网络安全和计算机网络,并且有可能被应用于许多其他领域,例如医疗保健,移动传感和众包。随着技术和设备的进步,我们世界上的数据量和来源数量仍在爆炸式增长,因此从大量海量数据集合中推断出有用的知识有很大的机会和众多的研究挑战。

著录项

  • 作者

    Xiao, Houping.;

  • 作者单位

    State University of New York at Buffalo.;

  • 授予单位 State University of New York at Buffalo.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 246 p.
  • 总页数 246
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号