首页> 外文会议>IEEE International Conference on Big Data >An Efficient Map-Reduce Algorithm for Computing Formal Concepts from Binary data
【24h】

An Efficient Map-Reduce Algorithm for Computing Formal Concepts from Binary data

机译:一种高效地图 - 减少从二进制数据计算正式概念的映射算法

获取原文

摘要

The problem of discovering all formal concepts embedded in a binary relational dataset is of significant interest for many data analysis and processing problems. The problem of enumerating all concepts for a dataset is known to be NP-hard. A number of Map-Reduce based algorithms have been developed to conquer the difficulty of processing large datasets. But these algorithms are not very scalable because at core all these algorithms stick to a DFS based sequential search for individual concepts, applying Map-Reduce formalism to parallelize the processing at nodes of the DFS tree. We have present here a completely different and Map-Reduce based formulation for parallelizing the concept discovery problem. One major difference of our approach is that we seek to find a sufficient set of concepts only, and this sufficient set can be used to generate all other concepts in the lattice. This formulation is not very suitable for sequential execution but adapts extremely well to the parallel environments based on Map-Reduce operators. We show in this paper that our algorithm is significantly faster than all the known Map-reduce formulations for discovering concepts in binary relational datasets. We have presented in this paper the outline of the theoretical foundations for our algorithm and empirical tests with a number of benchmarking datasets. We also show that the computationally very difficult problem of finding 3-clusters in pairs of binary relational datasets can also be made very efficient by our formulation.
机译:发现嵌入在二进制关系数据集中的所有正式概念的问题对于许多数据分析和处理问题来说是重大兴趣。已知枚举数据集的所有概念的问题是NP-HARD。已经开发了许多地图减少基于算法以征服处理大型数据集的难度。但是这些算法不是非常可扩展的,因为在核心All这些算法中,所有这些算法都基于基于DFS的顺序搜索各个概念,应用MAP-DENECTIONA中的形式主义并将其并行化DFS树节点的处理。我们在这里介绍了一个完全不同的和地图 - 减少了基于概念发现问题的并行化的基础配方。我们的方法的一个主要区别是我们寻求发现足够的概念,并且这种充足的集合可用于在格子中产生所有其他概念。该配方不是非常适合顺序执行,但基于地图减少运算符对并行环境非常良好。我们在本文中展示了我们的算法明显比所有已知的地图减少用于在二进制关系数据集中发现概念的制剂的速度更快。我们在本文中提出了我们算法的理论基础的大纲和具有许多基准测试数据集的实证测试。我们还表明,通过我们的配方可以非常有效地在二元关系数据集成对找到3集群的计算上非常困难的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号