首页> 外文会议>IEEE International Conference on Big Data >DLA: a Distributed, Location-based and Apriori-based Algorithm for Biological Sequence Pattern Mining

【24h】

DLA: a Distributed, Location-based and Apriori-based Algorithm for Biological Sequence Pattern Mining

机译：DLA：用于生物序列模式挖掘的分布式，基于位置和基于先验的算法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

With the rapid growth of genomic data, the need for scalable data mining algorithms has increased. Frequent contiguous sequence mining is a technique that can help biologists to better understand the function and structure of our DNA, by capturing the common characteristics among related sequences. Many sequence mining algorithms have been developed over time. However, most of them suffer from scaling issues when dealing with big data or give no warranty for the completeness of their result. In this paper, we propose a distributed sequential pattern mining algorithm implemented on Apache Spark. Specifically, the algorithm exploits the Apriori Property and information about each patterns location within the original sequence, to drastically reduce the number of candidates at each iteration. Experimental results on real-world datasets confirm our performance expectations, showing a better scalability when compared to other distributed solutions.

机译：随着基因组数据的快速增长，对可伸缩数据挖掘算法的需求不断增长。频繁的连续序列挖掘是一种技术，它可以通过捕获相关序列之间的共同特征来帮助生物学家更好地了解我们DNA的功能和结构。随着时间的推移，已经开发了许多序列挖掘算法。但是，它们中的大多数在处理大数据时会遇到扩展问题，或者对结果的完整性不做任何保证。在本文中，我们提出了一种在Apache Spark上实现的分布式顺序模式挖掘算法。具体而言，该算法利用Apriori属性和有关原始序列中每个模式位置的信息，以大幅度减少每次迭代中的候选数。实际数据集上的实验结果证实了我们对性能的期望，与其他分布式解决方案相比，显示了更好的可伸缩性。

著录项

来源
《IEEE International Conference on Big Data》|2018年|1121-1126|共6页
会议地点
作者
Eirini Stamoulakatou; Andrea Gulino; Pietro Pinoli;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Databases; Data mining; Bioinformatics; Big Data; DNA; Genomics;

机译：数据库;数据挖掘;生物信息学;大数据; DNA;基因组学;

相似文献

外文文献
中文文献
专利

1. DFSP: a Depth-First SPelling algorithm for sequential pattern mining of biological sequences [J] . Vance Chiang-Chi Liao, Ming-Syan Chen Knowledge and information systems . 2014,第3期

机译：DFSP：一种深度优先的拼写算法，用于生物序列的顺序模式挖掘
2. Developing Text Mining Based Algorithms for Classifying Biological Sequences [J] . HOANG KIEM, DO PHUC 電子情報通信学会技術研究報告. 人工知能と知識処理. Artificial Intelligence and Knowledge Based Processing . 2004,第485期

机译：开发基于文本挖掘的生物序列分类算法
3. Efficient mining gapped sequential patterns for motifs in biological sequences [J] . Vance Chiang-Chi Liao, Ming-Syan Chen BMC Systems Biology . 2013,第S4期

机译：高效挖掘生物序列中基序的缺口序列模式
4. DLA: a Distributed, Location-based and Apriori-based Algorithm for Biological Sequence Pattern Mining [C] . Eirini Stamoulakatou, Andrea Gulino, Pietro Pinoli IEEE International Conference on Big Data . 2018

机译：DLA：基于分布的，基于APRiori的生物序列模式挖掘算法
5. Efficient algorithms for identification and analysis of repetitive patterns in biological sequences [D] . Zheng, Jie 2006

机译：用于鉴定和分析生物序列中重复模式的高效算法
6. Efficient mining gapped sequential patterns for motifs in biological sequences [O] . Vance Chiang-Chi Liao, Ming-Syan Chen 2013

机译：高效挖掘生物序列中基序的缺口序列模式
7. Frequent Contiguous Pattern Mining Algorithms for Biological Data Sequences [O] . S. Rajasekaran, D Centre 2015

机译：生物数据序列的频繁连续模式挖掘算法
8. Detecting and Mining Similarities, Differences and Target Patterns in Sequences of Images Using the PFF, LGG and SPNG Approaches [R] . Bourbakis, D. 2004

机译：使用pFF，LGG和spNG方法检测和挖掘图像序列中的相似性，差异和目标模式

DLA: a Distributed, Location-based and Apriori-based Algorithm for Biological Sequence Pattern Mining

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅