Partitioning clustering algorithms for protein sequence data sets

Sondes Fayech; Nadia Essoussi; Mohamed Limam

首页> 外文期刊>BioData Mining >Partitioning clustering algorithms for protein sequence data sets

【24h】

Partitioning clustering algorithms for protein sequence data sets

机译：蛋白质序列数据集的分区聚类算法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Genome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. The unsupervised classification of these data into functional groups or families, clustering, has become one of the principal research objectives in structural and functional genomics. Computer programs to automatically and accurately classify sequences into families become a necessity. A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Among the various sequence clustering methods in literature, hierarchical and graph-based approaches have been widely used. Although partitioning clustering techniques are extremely used in other fields, few applications have been found in the field of protein sequence clustering. It is not fully demonstrated if partitioning methods can be applied to protein sequence data and if these methods can be efficient compared to the published clustering methods. Methods We developed four partitioning clustering approaches using Smith-Waterman local-alignment algorithm to determine pair-wise similarities of sequences. Four different sets of protein sequences were used as evaluation data sets for the proposed methods. Results We show that these methods outperform several other published clustering methods in terms of correctly predicting a classifier and especially in terms of the correctness of the provided prediction. The software is available to academic users from the authors upon request.

机译：背景技术基因组测序项目目前正在产生大量新序列，并导致蛋白质序列数据库的迅速增加。将这些数据无监督地分为功能组或家族，聚类，已成为结构和功能基因组学的主要研究目标之一。自动且准确地将序列分类为家族的计算机程序成为必要。许多方法已经解决了蛋白质序列的聚类问题，其中大多数可以分为三大类：分层方法，基于图的方法和分区方法。在文献中的各种序列聚类方法中，分层和基于图的方法已被广泛使用。尽管分区聚类技术在其他领域中被广泛使用，但是在蛋白质序列聚类领域中却发现了很少的应用。尚未完全证明分区方法是否可以应用于蛋白质序列数据，以及与公开的聚类方法相比这些方法是否有效。方法我们使用Smith-Waterman局部比对算法开发了四种分区聚类方法，以确定序列的成对相似性。四种不同的蛋白质序列集用作所提出方法的评估数据集。结果我们显示，在正确预测分类器方面，尤其是在提供的预测的正确性方面，这些方法优于其他几种已发表的聚类方法。该软件可应要求提供给作者的学术用户。

著录项

来源
《BioData Mining》 |2009年第1期|共页
作者
Sondes Fayech; Nadia Essoussi; Mohamed Limam;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. A fast hierarchical clustering algorithm for large-scale protein sequence data sets [J] . Sor M. Szilagyi, Laszlo Szilagyi Computers in Biology and Medicine . 2014,第Null期

机译：大规模蛋白质序列数据集的快速层次聚类算法
2. Deployment of Partitioning Around Medoids Clustering Algorithm on a Set of Objects Derived from Analytical CRM Data [J] . J. Mbarki, E.M. Jaara Research journal of applied science, engineering and technology . 2014,第4期

机译：基于分析型CRM数据的一组对象上围绕Medoids聚类算法进行分区的部署
3. Deployment of Partitioning Around Medoids Clustering Algorithm on a Set of Objects Derived from Analytical CRM Data [J] . J. Mbarki, E.M. Jaara Research journal of applied science, engineering and technology . 2014,第4期

机译：基于分析型CRM数据的一组对象上的基于Medoids聚类算法的分区部署
4. A graph-based clustering method for a large set of sequences using a graph partitioning algorithm [C] . Hideya Kawaji, Yosuke Yamaguchi, Hideo Matsuda, Workshop on Genome Informatics . 2001

机译：一种基于图的聚类方法，用于使用曲线图分区算法的大组序列
5. Data structures and algorithms for partitioning a set into sets of non-descending cardinality. [D] . Titti, Oshani. 2016

机译：用于将一组划分为一组非降序基数的数据结构和算法。
6. Partitioning clustering algorithms for protein sequence data sets [O] . Sondes Fayech, Nadia Essoussi, Mohamed Limam 2009

机译：蛋白质序列数据集的分区聚类算法
7. Partitioning clustering algorithms for protein sequence data sets [O] . 2009

机译：蛋白质序列数据集的分区聚类算法
8. Measuring Constraint-Set Utility for Partitional Clustering Algorithms [R] . Davidson, Ian, Wagstaff, Kiri L., Basu, Sugato 2006

机译：测量分区聚类算法的约束集效用

Partitioning clustering algorithms for protein sequence data sets

摘要

著录项

相似文献

相关主题

期刊订阅