...
首页> 外文期刊>Knowledge-Based Systems >True mean value discovery over multiple data sources with unknown reliability degrees
【24h】

True mean value discovery over multiple data sources with unknown reliability degrees

机译:真正的平均值发现多个数据源,具有未知的可靠性度

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In the era of big data, we are committed to obtaining the observations of target objects from a wider range of data sources. As the number of data sources increases, we expect that more trustworthy statistical parameters can be estimated from the multi-source observations, for example, the population mean. However, the reliability of data sources rarely attracts our attention, because the hypothesis testing seems to be an effective tool for determining whether a given estimate is acceptable. In practice, the noisy observations from different unreliable data sources may have different statistical characteristic parameters, and these parameters are unknown. It makes the condition that observations should be identically distributed in hypothesis testing no longer tenable. Therefore, a poor estimate of the population mean may be accepted, as the hypothesis testing is performed over the multi-source observations. To address this issue, in this paper, we propose a true mean value discovery algorithm in which we can use multi-source observations to determine whether an estimated population mean should be rejected. Additionally, the reliability degree of each data source can be estimated using the proposed algorithm. By removing incorrect observations provided by unreliable sources, we can obtain more reliable estimates of true population means. Experiments on three real-world tasks demonstrate that the proposed method outperforms state-of-the-art approaches. (C) 2021 Elsevier B.V. All rights reserved.
机译:在大数据的时代,我们致力于从更广泛的数据源中获取目标对象的观察。随着数据源的数量增加,我们预期可以从多源观察估计更值得信赖的统计参数,例如人口意味着。然而,数据源的可靠性很少吸引我们的注意力,因为假设检测似乎是用于确定给定估计是否可接受的有效工具。在实践中,来自不同不可靠的数据源的噪声观测可能具有不同的统计特征参数,并且这些参数未知。它使观察结果应在假设检测中相同分布不再是宗旨。因此,可以接受对群体平均值的差异,因为在多源观察中进行假设测试。为了解决这个问题,在本文中,我们提出了一个真正的平均值发现算法,其中我们可以使用多源观察来确定是否应该拒绝估计的人口意思。另外,可以使用所提出的算法估计每个数据源的可靠性程度。通过删除不可靠来源提供的不正确的观察,我们可以获得更可靠的真实人口意味着的估计。三个现实世界任务的实验表明,所提出的方法优于最先进的方法。 (c)2021 elestvier b.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号