首页> 外文会议>International Conference on Advanced Cloud and Big Data >A Parallel Algorithm for Mining Density-Aware Distinguishing Sequential Patterns with Spark
【24h】

A Parallel Algorithm for Mining Density-Aware Distinguishing Sequential Patterns with Spark

机译:一种利用Spark挖掘密度识别序列模式的并行算法

获取原文

摘要

Distinguishing sequential pattern (DSP) mining is a useful technique to discriminate a set of sequences of one class against a set of sequences of another class. One kind of DSP that, considers the density concept in DSP mining (called density-aware DSP) has many applications in bioinformatics and computational biology. However, the previous method to mine density-aware DSPs suffers from the inefficient density computing. As a result, the previous method cannot deal with the datasets with large scale. To break this limitation, we design and implement a parallel mining method to discover density-aware DSPs using Spark, which is a popular framework for parallel computing. Our empirical study on real datasets demonstrates that our proposed method is efficient and scalable.
机译:区分顺序模式(DSP)挖掘是一种有用的技术,可将一个类别的序列集与另一类别的序列集区分开。一种在DSP挖掘中考虑密度概念的DSP(称为密度感知DSP)在生物信息学和计算生物学中有许多应用。但是,用于挖掘密度感知型DSP的先前方法存在密度计算效率低下的问题。结果,先前的方法不能大规模处理数据集。为了克服此限制,我们设计并实现了一种并行挖掘方法,以使用Spark(这是一种流行的并行计算框架)来发现密度感知型DSP。我们对真实数据集的实证研究表明,我们提出的方法是有效且可扩展的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号