首页> 美国卫生研究院文献>other >ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems

ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems




Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, a parallel tool to accelerate the search of interesting biclusters on binary datasets, which are very popular on different fields such as genetics, marketing or text mining. It is based on the state-of-the-art sequential Java tool BiBit, which has been proved accurate by several studies, especially on scenarios that result on many large biclusters. ParBiBit uses the same methodology as BiBit (grouping the binary information into patterns) and provides the same results. Nevertheless, our tool significantly improves performance thanks to an efficient implementation based on C++11 that includes support for threads and MPI processes in order to exploit the compute capabilities of modern distributed-memory systems, which provide several multicore CPU nodes interconnected through a network. Our performance evaluation with 18 representative input datasets on two different eight-node systems shows that our tool is significantly faster than the original BiBit. Source code in C++ and MPI running on Linux systems as well as a reference manual are available at .
机译:在对大型数据集进行分析时,Biclustering技术引起了人们的关注,因为它们可以识别行和列都相关的二维子矩阵。在这项工作中,我们介绍了ParBiBit,这是一种并行工具,可用于加快在二进制数据集上有趣的双峰的搜索,该二进制数据在诸如遗传学,市场营销或文本挖掘等不同领域非常流行。它基于最新的顺序Java工具BiBit,该工具已被多项研究证明是准确的,尤其是在导致许多大型二元组产生的场景中。 ParBiBit使用与BiBit相同的方法(将二进制信息分组为模式)并提供相同的结果。尽管如此,由于基于C ++ 11的高效实现(包括对线程和MPI进程的支持),我们的工具大大提高了性能,从而可以利用现代分布式内存系统的计算能力,该系统提供了多个通过网络互连的多核CPU节点。我们对两个不同的八节点系统上的18个代表性输入数据集进行了性能评估,结果表明我们的工具比原始的BiBit快得多。可以在Linux系统上运行C ++和MPI的源代码以及参考手册。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号