首页> 外文会议>2013 IEEE International Conference on Big Data >Elastic algorithms for guaranteeing quality monotonicity in big data mining

【24h】

Elastic algorithms for guaranteeing quality monotonicity in big data mining

机译：大数据挖掘中保证质量单调性的弹性算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

When mining large data volumes in big data applications users are typically willing to use algorithms that produce acceptable approximate results satisfying the given resource and time constraints. Two key challenges arise when designing such algorithms. The first relates to reasoning about tradeoffs between the quality of data mining output, e.g. prediction accuracy for classification tasks and available resource and time budgets. The second is organizing the computation of the algorithm to guarantee producing better quality of results as more budget is used. Little work has addressed these two challenges together in a generic way. In this paper, we propose a novel framework for developing elastic big data mining algorithms. Based on Shannon's entropy, an information-theoretic approach is introduced to reason about how result quality is affected by the allocated budget. This is then used to guide the development of algorithms that adapt to the available time budgets while guaranteeing producing better quality results as more budgets are used. We demonstrate the application of the framework by developing elastic k-Nearest Neighbour (kNN) classification and collaborative filtering (CF) recommendation algorithms as two examples. The core of both elastic algorithms is to use a naïve kNN classification or CF algorithm over R-tree data structures that successively approximate the entire datasets. Experimental evaluation was performed using prediction accuracy as quality metric on real datasets. The results show that elastic mining algorithms indeed produce results with consistent increase in observable qualities, i.e., prediction accuracy, in practice.

机译：在大数据应用程序中挖掘大数据量时，用户通常愿意使用能产生满足给定资源和时间约束的可接受的近似结果的算法。设计此类算法时会遇到两个关键挑战。第一个涉及到关于数据挖掘输出的质量之间的权衡的推理，例如分类任务的预测准确性以及可用的资源和时间预算。第二个是组织算法的计算，以确保随着使用更多预算而产生更好的结果质量。很少有工作以通用的方式一起解决这两个挑战。在本文中，我们提出了一个开发弹性大数据挖掘算法的新颖框架。基于香农的熵，引入了一种信息理论方法来说明分配预算如何影响结果质量。然后，它可用于指导算法的开发，以适应可用的时间预算，同时随着使用更多的预算，保证产生更好的质量结果。我们通过开发弹性k最近邻（kNN）分类和协作过滤（CF）推荐算法作为两个示例来演示该框架的应用。两种弹性算法的核心是在R树数据结构上使用朴素的kNN分类或CF算法，从而连续逼近整个数据集。使用预测准确性作为真实数据集的质量指标进行实验评估。结果表明，在实践中，弹性挖掘算法的确能产生可观察质量（即预测精度）持续提高的结果。

著录项

来源
《2013 IEEE International Conference on Big Data 》|2013年|45-50|共6页
会议地点 Santa Clara CA(US)
作者
Han Rui; Nie Lei; Ghanem Moustafa M.; Guo Yike;
展开▼
作者单位

Imperial College London, London, UKc;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
R-tree; elastic data mining algorithms; entropy; quality monotonicity;

机译：R树;弹性数据挖掘算法;熵;质量单调性;;

相似文献

外文文献
中文文献
专利

1. Research on Data Mining Algorithm of Meteorological Observation Based on Data Quality Control Algorithm [J] . Ren Qing-Dao-Er-Ji, Na Li Wireless personal communications: An Internaional Journal . 2018 ,第2期

机译：基于数据质量控制算法的气象观测数据挖掘算法研究
2. Use of Data Mining Algorithms Chaid and Cart in Predicting Egg Weight from Egg Quality Traits of Indigenous Free-Range Chickens in Zambia [J] . Simushi Liswaniso, Ning Qin, Thobela Louis Tyasi, Advances in Animal and Veterinary Sciences . 2020 ,第2期

机译：使用数据挖掘算法CHAID和推车预测赞比亚土着自由放射性鸡蛋质量特征的鸡蛋重
3. Feature extraction algorithms from MRI to evaluate quality parameters on meat products by using data mining [J] . Daniel Caballero Electronic Letters on Computer Vision and Image Analysis: ELCVIA . 2018 ,第2期

机译：MRI的特征提取算法，通过使用数据挖掘来评估肉类产品的质量参数
4. Elastic algorithms for guaranteeing quality monotonicity in big data mining [C] . Han Rui, Nie Lei, Ghanem Moustafa M., IEEE International Conference on Big Data . 2013

机译：用于保证大数据挖掘质量单调性的弹性算法
5. Service-curve based algorithms for scheduling with quality-of-service guarantees in packet-switched networks and switches. [D] . Al-Harthi, Saleh K. H. 2001

机译：用于分组交换网络和交换机中具有服务质量保证的调度的基于服务曲线的算法。
6. Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches [O] . Faranak Kazerouni, Azadeh Bayani, Farkhondeh Asadi, 2020

机译：基于长非编码RNA表达的数据挖掘算法类型2糖尿病预测：四种数据采矿方法的比较
7. Efficient Algorithms for Mining Significant Substructures in Graphs with Quality Guarantees [O] . Huahai He et al. 2008

机译：具有质量保证的高效挖掘图中重要子结构的算法

Elastic algorithms for guaranteeing quality monotonicity in big data mining

摘要

著录项

相似文献

相关主题

期刊订阅