首页> 外文期刊>Journal of Parallel and Distributed Computing >Distributed load balancing frequent colossal closed itemset mining algorithm for high dimensional dataset
【24h】

Distributed load balancing frequent colossal closed itemset mining algorithm for high dimensional dataset

机译:分布式负载平衡频繁的高维数据集频繁巨大闭合项目集挖掘算法

获取原文
获取原文并翻译 | 示例

摘要

The focus of extracting colossal closed itemsets from high dimensional biological datasets has been great in recent times. A massive set of short and average sized mined itemsets do not confine complete and valuable information for decision making. But, the traditional itemset mining algorithms expend a gigantic measure of time in mining a massive set of short and average sized itemsets. The greater interest of research in the field of bioinformatics and the abundant data across the variety of domains paved the way for the generation of the high dimensional dataset. These datasets are depicted by an extensive number of features and a smaller number of rows. Colossal closed itemsets are very significant for numerous applications including the field of bioinformatics and are influential during the decision making. Extracting a huge amount of information and knowledge from the high dimensional dataset is a nontrivial task. The existing colossal closed itemsets mining algorithms for the high dimensional dataset are sequential and computationally expensive. Distributed and parallel computing is a good strategy to overcome the inefficiency of the existing sequential algorithm. Balanced Distributed Parallel Frequent Colossal Closed Itemset Mining (BDPFCCIM) algorithm is designed for high dimensional datasets. An efficient closeness checking method to check the closeness of the rowset and an efficient pruning strategy to snip the row enumeration mining search space is enclosed with the proposed BDPFCCIM algorithm. The proposed BDPFCCIM algorithm is the first distributed load balancing algorithm to mine frequent colossal closed itemsets from high dimensional biological datasets. The experimental results demonstrate the efficient performance of the proposed BDPFCCIM algorithm in comparison with the state-of-the-art algorithms.
机译:最近从高维生物数据集中提取巨大封闭项目的焦点在很大程度上是很大的。一套大量的短平均和平均尺寸的开采项目集不限于决策的完整和有价值的信息。但是,传统的项目集矿业算法消耗了挖掘巨大一套短和平均尺寸的项目集中的巨大时间。在生物信息学领域的研究以及各种域中的丰富数据的研究更令人兴趣,为生成高维数据集铺平了道路。这些数据集由大量的特征和较少的行表示。庞大的封闭项目对于包括生物信息学领域的许多应用以及在决策过程中有影响力,这是非常重要的。从高维数据集中提取大量信息和知识是一个非活动任务。高维数据集的现有巨大封闭项集挖掘算法是顺序和计算昂贵的。分布式和并行计算是克服现有连续算法的低效率的良好策略。平衡分布式并行常亮巨大刻录项目集挖掘(BDPFCCIM)算法专为高维数据集设计。一种有效的亲密性检查方法,用于检查行列式的闭合和有效的修剪策略,以提出的BDPFCCIM算法括起行枚举挖掘搜索空间。所提出的BDPFCCIM算法是来自高维生物数据集的频繁巨大封闭项目集的第一次分布式负载平衡算法。实验结果表明,与最先进的算法相比,所提出的BDPFCCIM算法的有效性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号