An Efficient Mining for Approximate Frequent Items in Protein Sequence Database

J. Jeyabharathi; D. Shanthi

首页> 外文期刊>Journal of Emerging Technologies in Web Intelligence >An Efficient Mining for Approximate Frequent Items in Protein Sequence Database

【24h】

An Efficient Mining for Approximate Frequent Items in Protein Sequence Database

机译：高效的蛋白质序列数据库中常见项目的挖掘

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

—The rapid increase of available proteins, DNA and other biological sequences has made the problem of discovering the meaningful patterns from sequences, a major task for Bioinformatics research. Data mining of protein sequence databases poses special challenges, because several protein databases are non-relational whereas most of the data mining and machine learning techniques considers the data input to be a relational database. The existing sequence mining algorithms mainly focus on mining for subsequences. However, a wide range of applications such as biological DNA and protein motif mining needs an effective mining for identifying the approximate frequent patterns. The existing approximate frequent pattern mining algorithms have some delimitations such as lack of knowledge to finding the patterns, poor scalability and complexity to adapt into some other applications. In this paper, a Generalized Approximate Pattern Algorithm (GAPA) is proposed to efficiently mine the approximate frequent patterns in the protein sequence database. Pearson’s coefficient correlation is computed among the protein sequence database items to analyze the approximate frequent patterns. The performance of the proposed GAPA is analyzed and tested with the FASTA protein sequence database. FASTA database files hold the protein translations of Ensembl gene predictions. GAPA is compared with the existing methods such as Approximate Frequent Itemsets (AFI) tree and Approximate Closed Frequent Itemsets (ACFIM) in terms of support, accuracy, memory usage and time consumption. The experimental results shows GAPA is scalable and outperforms than the existing algorithms.

机译：-可用蛋白质，DNA和其他生物序列的迅速增加使从序列中发现有意义的模式成为了生物信息学研究的主要任务。蛋白质序列数据库的数据挖掘提出了特殊的挑战，因为一些蛋白质数据库是非关系数据库，而大多数数据挖掘和机器学习技术都将数据输入视为关系数据库。现有的序列挖掘算法主要集中于子序列的挖掘。但是，诸如生物DNA和蛋白质基序挖掘的广泛应用需要有效的挖掘来识别近似的频繁模式。现有的近似频繁模式挖掘算法具有一些局限性，例如缺乏对模式的了解，可伸缩性差，难以适应其他一些应用程序的复杂性。本文提出了一种通用近似模式算法（GAPA）来有效地挖掘蛋白质序列数据库中的近似频繁模式。在蛋白质序列数据库项目之间计算Pearson的系数相关性，以分析近似的频繁模式。建议的GAPA的性能已通过FASTA蛋白序列数据库进行了分析和测试。 FASTA数据库文件包含Ensembl基因预测的蛋白质翻译。在支持，准确性，内存使用和时间消耗方面，将GAPA与现有方法（例如，近似频繁项目集（AFI）树和近似封闭频繁项目集（ACFIM））进行了比较。实验结果表明，GAPA具有可扩展性，并且性能优于现有算法。

著录项

来源
《Journal of Emerging Technologies in Web Intelligence》 |2014年第3期|共7页
作者
J. Jeyabharathi; D. Shanthi;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. An Efficient Closed Frequent Item Sets Mining Algorithm-For Mining Closed Frequent Item Sets from Data Streams [J] . Kuthadi Venu Madhav, Selvaraj Rajalakshmi Journal of computational and theoretical nanoscience . 2016,第10期

机译：有效的封闭频繁项目设置挖掘算法 - 用于挖掘数据流的闭合频繁项目集
2. An Efficient Method for Mining Frequent Weighted Closed Itemsets from Weighted Item Transaction Databases [J] . Bay Vo Journal of Information Recording . 2017,第1期

机译：一种从加权项目交易数据库中挖掘频繁的加权封闭项目集的有效方法
3. An Efficient Mining Approach of Frequent Data Item Sets on Large Uncertain Databases [J] . Isse Hassan Sheikh Nur International Journal of Computer Trends and Technology . 2015,第1期

机译：大型不确定数据库上频繁数据项集的有效挖掘方法
4. Efficient Algorithms for Mining Frequent Weighted Itemsets from Weighted Items Databases [C] . Le Bac, Nguyen Huy, Vo Bay 2010 IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future . 2010

机译：从加权项目数据库中挖掘频繁加权项目集的高效算法
5. Mining Frequent Sequences in One Database Scan Using Distributed Computers. [D] . Brajczuk, Dale Allan. 2011

机译：使用分布式计算机在一次数据库扫描中挖掘频繁序列。
6. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases [O] . Md. Rezaul Karim, Md. Mamunur Rashid, Byeong-Soo Jeong, 2012

机译：从大型DNA序列数据库中挖掘最大连续频率模式的有效方法
7. An Improved Algorithm for Efficient Mining of Frequent Item Sets on Large Uncertain Databases [O] . Bala Yesu Chilakalapudi, Gudlavalleru Engineering, Narayana Satyala, 2014

机译：一种改进的大型不确定数据库频繁项集有效挖掘算法
8. Efficient bit string implementation of a database cross-field association system (with an application to protein sequence patterns) [R] . Guigo, R, Vazquez, I, Smith, T F 1992

机译：数据库跨域关联系统的高效位串实现（应用于蛋白质序列模式）

An Efficient Mining for Approximate Frequent Items in Protein Sequence Database

摘要

著录项

相似文献

相关主题

期刊订阅