An efficient clustering algorithm for partitioning Y-short tandem repeats data

Ali Seman; Zainab Abu Bakar; Mohamed Nizam Isa

首页> 外文期刊>BMC research notes >An efficient clustering algorithm for partitioning Y-short tandem repeats data

【24h】

An efficient clustering algorithm for partitioning Y-short tandem repeats data

机译：一种有效的Y短串联重复数据分区算法

获取原文

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Background Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. Results Our new algorithm, called k -Approximate Modal Haplotypes ( k -AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k -AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k -Population (0.91), k -Modes-RVF (0.81), New Fuzzy k -Modes (0.80), k -Modes (0.76), k -Modes-Hybrid 1 (0.76), k -Modes-Hybrid 2 (0.75), Fuzzy k -Modes (0.74), and k -Modes-UAVM (0.70). Conclusions The partitioning performance of the k -AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O ( km ( n-k )) and considered to be linear.

机译：背景Y短串联重复（Y-STR）数据由许多相似且几乎相似的对象组成。 Y-STR数据的此特征导致两个分区问题：非唯一质心和局部极小问题。结果，现有的分区算法产生差的聚类结果。结果我们的新算法称为k-近似模态单倍型（k -AMH），在六个数据集中有五个获得了最高的聚类准确性得分，并对其余数据集产生了相同的性能。此外，两个数据集的聚类准确性得分达到100％。与其他算法相比，k -AMH算法记录的最高平均准确度得分为0.93，k-人口（0.91），k-模式-RVF（0.81），新模糊k-模式（0.80），k-模式（0.76），k-模式混合1（0.76），k-模式混合2（0.75），模糊k-模式（0.74）和k-模式-UAVM（0.70）。结论k -AMH算法对Y-STR数据的分区性能优于其他算法，这是因为它具有解决非唯一质心和局部极小问题的能力。我们的算法在时间复杂度方面也很有效，它记录为O（km（n-k））并被认为是线性的。

著录项

来源
《BMC research notes》 |2012年第1期|共页
作者
Ali Seman; Zainab Abu Bakar; Mohamed Nizam Isa;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类药物基础科学;
关键词

相似文献

外文文献
中文文献
专利

1. First Y-Short Tandem Repeat Categorical Dataset for Clustering Applications [J] . AliSeman, ZainabAbu Bakar, Mohamed NizamIsa Dataset Papers in Science . 2013,第1期

机译：用于聚类应用程序的第一个Y短串联重复分类数据集
2. Are rapidly mutating Y-short tandem repeats useful to resolve a lineage? Expanding mutability data on distant male relationships [J] . Turrina Stefania, Caratti Stefano, Ferrian Melissa, Transfusion: The Journal of the American Association of Blood Banks . 2016,第2期

机译：快速变异的Y短串联重复序列对解决谱系有用吗？扩展有关遥远男性关系的变异性数据
3. An Efficient Method of Partitioning High Volumes of Multidimensional Data for Parallel Clustering Algorithms [J] . Saraswati Mishra, Avnish Chandra Suman International Journal of Engineering Research and Applications . 2016,第8期

机译：并行聚类算法的高效分割大量多维数据的有效方法
4. Hard and soft updating centroids for clustering Y-short tandem repeats (Y-STR) data [C] . Seman A., Bakar Z.A., Daud N. 2010 IEEE Conference on Open Systems . 2010

机译：硬更新和软更新质心，用于聚类Y-短串联重复（Y-STR）数据
5. Algorithms to efficiently partition Poisson distributed data. [D] . Barnes, David Foster. 2002

机译：有效划分泊松分布数据的算法。
6. An efficient clustering algorithm for partitioning Y-short tandem repeats data [O] . Ali Seman, Zainab Abu Bakar, Mohamed Nizam Isa 2012

机译：一种有效的Y短串联重复序列数据分割算法
7. An efficient clustering algorithm for partitioning Y-short tandem repeats data [O] . Ali Seman, Zainab Abu Bakar, Mohamed Nizam Isa 2012

机译：一种有效的Y短串联重复序列数据分割算法
8. Measuring Constraint-Set Utility for Partitional Clustering Algorithms [R] . Davidson, Ian, Wagstaff, Kiri L., Basu, Sugato 2006

机译：测量分区聚类算法的约束集效用

An efficient clustering algorithm for partitioning Y-short tandem repeats data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅