True mean value discovery over multiple data sources with unknown reliability degrees

Ye Songtao; Wang Junjie; Fan Hongjie; Zhang Zhiqiang

首页> 外文期刊>Knowledge-Based Systems >True mean value discovery over multiple data sources with unknown reliability degrees

【24h】

True mean value discovery over multiple data sources with unknown reliability degrees

机译：真正的平均值发现多个数据源，具有未知的可靠性度

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the era of big data, we are committed to obtaining the observations of target objects from a wider range of data sources. As the number of data sources increases, we expect that more trustworthy statistical parameters can be estimated from the multi-source observations, for example, the population mean. However, the reliability of data sources rarely attracts our attention, because the hypothesis testing seems to be an effective tool for determining whether a given estimate is acceptable. In practice, the noisy observations from different unreliable data sources may have different statistical characteristic parameters, and these parameters are unknown. It makes the condition that observations should be identically distributed in hypothesis testing no longer tenable. Therefore, a poor estimate of the population mean may be accepted, as the hypothesis testing is performed over the multi-source observations. To address this issue, in this paper, we propose a true mean value discovery algorithm in which we can use multi-source observations to determine whether an estimated population mean should be rejected. Additionally, the reliability degree of each data source can be estimated using the proposed algorithm. By removing incorrect observations provided by unreliable sources, we can obtain more reliable estimates of true population means. Experiments on three real-world tasks demonstrate that the proposed method outperforms state-of-the-art approaches. (C) 2021 Elsevier B.V. All rights reserved.

机译：在大数据的时代，我们致力于从更广泛的数据源中获取目标对象的观察。随着数据源的数量增加，我们预期可以从多源观察估计更值得信赖的统计参数，例如人口意味着。然而，数据源的可靠性很少吸引我们的注意力，因为假设检测似乎是用于确定给定估计是否可接受的有效工具。在实践中，来自不同不可靠的数据源的噪声观测可能具有不同的统计特征参数，并且这些参数未知。它使观察结果应在假设检测中相同分布不再是宗旨。因此，可以接受对群体平均值的差异，因为在多源观察中进行假设测试。为了解决这个问题，在本文中，我们提出了一个真正的平均值发现算法，其中我们可以使用多源观察来确定是否应该拒绝估计的人口意思。另外，可以使用所提出的算法估计每个数据源的可靠性程度。通过删除不可靠来源提供的不正确的观察，我们可以获得更可靠的真实人口意味着的估计。三个现实世界任务的实验表明，所提出的方法优于最先进的方法。（c）2021 elestvier b.v.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2021年第8期|107036.1-107036.12|共12页
作者
Ye Songtao; Wang Junjie; Fan Hongjie; Zhang Zhiqiang;
展开▼
作者单位

Xiangtan Univ Sch Cyberspace Secur Xiangtan 411105 Peoples R China|Xiangtan Univ Sch Comp Sci Xiangtan 411105 Peoples R China;

Xiangtan Univ Sch Cyberspace Secur Xiangtan 411105 Peoples R China|Xiangtan Univ Sch Comp Sci Xiangtan 411105 Peoples R China;

China Univ Polit Sci & Law Dept Sci & Technol Teaching Beijing 102249 Peoples R China;

Natl Meteorol Informat Ctr Beijing 100081 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multiple data sources; Source reliabilities; True mean value discovery; Confidence interval;

机译：多个数据源;源可靠性;真正的平均值发现;置信区间;

相似文献

外文文献
中文文献
专利

1. True lemurs…true species - species delimitation using multiple data sources in the brown lemur complex [J] . Matthias Markolf, Hanitriniaina Rakotonirina, Claudia Fichtel, BMC Evolutionary Biology . 2013,第1期

机译：真狐猴…真种-在棕色狐猴群中使用多个数据源进行物种界定
2. A multiple-dimension liquid chromatography coupled with mass spectrometry data strategy for the rapid discovery and identification of unknown compounds from a Chinese herbal formula (Er-xian decoction) [J] . Caihong Wang, Jinlan Zhang, Caisheng Wu, Journal of chromatography, A: Including electrophoresis and other separation methods . 2017,第期

机译：一种多尺寸液相色谱，与质谱数据策略相结合，用于快速发现和鉴定来自中草原（ER-Xian汤）的未知化合物
3. Pattern Discovery for Multiple Data Sources Based on Item Rank [J] . Arti Deshpande, Anjali Mahajan, A Thomas International Journal of Data Mining & Knowledge Management Process . 2017,第1期

机译：基于项目等级的多个数据源模式发现
4. Reliability Growth Projections Based on Data from Multiple Data Sources and Environments [C] . Larry H. Crow Annual Reliability and Maintainability Symposium . 2019

机译：基于来自多个数据源和环境的数据的可靠性增长预测
5. Knowledge discovery using multiple sources of biological data [D] . Yang, Chengyong 2006

机译：使用多种生物数据来源的知识发现
6. True lemurs…true species - species delimitation using multiple data sources in the brown lemur complex [O] . Matthias Markolf, Hanitriniaina Rakotonirina, Claudia Fichtel, 2013

机译：真狐猴…真实物种–在棕色狐猴群中使用多个数据源进行物种界定
7. True lemurs…true species - species delimitation using multiple data sources in the brown lemur complex [O] . 2013

机译：真狐猴…真实物种–在棕色狐猴群中使用多个数据源进行物种界定
8. Data Used in Quantified Reliability Models - Comparison of Reported Data from Multiple Data Sources [R] . Kleinhammer, R K 2014

机译：量化可靠性模型中使用的数据 - 来自多个数据源的报告数据的比较

True mean value discovery over multiple data sources with unknown reliability degrees

摘要

著录项

相似文献

相关主题

期刊订阅