Bayesian Classifier Modeling for Dirty Data

机译：贝叶斯分类器脏数据建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Bayesian classifiers have been proven effective in many practical applications. To train a Bayesian classifier, important parameters such as prior and class conditional probabilities need to be learned from datasets. In practice, datasets are prone to errors due to dirty (missing, erroneous or duplicated) values, which will severely affect the model accuracy if no data cleaning task is enforced. However, cleaning the whole dataset is prohibitively laborious and thus infeasible for even medium-sized datasets. To this end, we propose to induce Bayes models by cleaning only small samples of the dataset. We derive confidence intervals as a function of sample size after data cleaning. In this way, the posterior probability is guaranteed to fall into the estimated confidence intervals with constant probability. Then, we design two strategies to compare the posterior probability intervals if overlap exists. Extension to semi-naive Bayes method is also addressed. Experimental results suggest that cleaning only a small number of samples can train satisfactory Bayesian models, offering significant, improvement in cost over cleaning all of the data and significant improvement on precision, recall and F-Measure over cleaning none of the data.

机译：在许多实际应用中，贝叶斯分类器已被证明是有效的。要培训贝叶斯分类器，需要从数据集中学习先前和类条件概率等重要参数。在实践中，如果没有强制执行数据清洁任务，则数据集可能因脏（缺失，错误或重复的）值而易于影响模型准确性。但是，清洁整个数据集对甚至中型数据集来说是非常费力的，因此对甚至是中型的数据集不可行。为此，我们建议通过仅清洁数据集的小样本来诱导贝叶斯模型。我们在数据清洁后的样本大小的函数中获得置信区间。以这种方式，保证后概率落入具有恒定概率的估计置信区间。然后，我们设计两种策略以比较后验概率间隔，如果存在重叠。还解决了半幼稚贝叶斯方法的延伸。实验结果表明，清洁只有少量样品可以培训令人满意的贝叶斯模型，提供重大，提高成本，在清洁所有数据和对精密，召回和F测量的显着改进，在清洁的情况下没有任何数据。

著录项

来源
《Pacific Rim international conference on artificial intelligence》|2019年|xx 761 p.|共14页
会议地点
作者
Hongya Wang; Weidong Cheng; Kaiyan Guo; Yingyuan Xiao; Zhenyu Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词
Bayesian classifiers; Data cleaning; Probability intervals;

机译：贝叶斯分类器;数据清洁;概率间隔;

相似文献

外文文献
中文文献
专利

1. Bayesian model averaging of Bayesian network classifiers over multiple node-orders: application to sparse datasets [J] . Kyu-Baek Hwang, Byoung-Tak Zhang IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics . 2005,第6期

机译：多节点顺序上贝叶斯网络分类器的贝叶斯模型平均：应用于稀疏数据集
2. Temporal Bayesian classifiers for modelling muscular dystrophy expression data [J] . Allan Tucker, Peter. A.C. t Hoen, Veronica Vinciotti, Intelligent data analysis . 2006,第5期

机译：用于肌肉营养不良表达数据建模的时间贝叶斯分类器
3. Galaxy Merger Rates up to z?～?3 Using a Bayesian Deep Learning Model: A Major-merger Classifier Using IllustrisTNG Simulation Data [J] . Leonardo Ferreira, Christopher J. Conselice, Kenneth Duncan, The Astrophysical journal . 2020,第2期

机译：Galaxy合并率最多Z？〜？3使用贝叶斯深度学习模型：使用IllustrySng模拟数据的主要合并分类器
4. Bayesian Classifier Modeling for Dirty Data [C] . Hongya Wang, Weidong Cheng, Kaiyan Guo, Pacific Rim international conference on artificial intelligence . 2019

机译：脏数据的贝叶斯分类器建模
5. Development of a combined GIS, neural network and Bayesian classifier methodology for classifying remotely sensed data. [D] . Schneider, Claudio Albert. 2002

机译：结合了GIS，神经网络和贝叶斯分类器方法，对遥感数据进行分类。
6. MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification [O] . Jason M Knight, Ivan Ivanov, Edward R Dougherty 2014

机译：非高斯模型的最佳贝叶斯分类器的MCMC实现：基于模型的RNA-Seq分类
7. Bayesian model averaging of Bayesian network classifiers over multiple node-orders: application to sparse datasets [O] . Kyu-baek Hwang, Byoung-tak Zhang 1982

机译：贝叶斯模型在多个节点顺序上对贝叶斯网络分类器进行平均：应用于稀疏数据集
8. Peering through a Dirty Window: A Bayesian Approach to Making Mine Detection Decisions from Noisy Data. [R] . Kercel, S. W. 1998

机译：窥视肮脏的窗口：贝叶斯方法从噪声数据中做出探雷决策。

Bayesian Classifier Modeling for Dirty Data

摘要

著录项

相似文献

相关主题

期刊订阅