Background Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. Results Our new algorithm, called k -Approximate Modal Haplotypes ( k -AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k -AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k -Population (0.91), k -Modes-RVF (0.81), New Fuzzy k -Modes (0.80), k -Modes (0.76), k -Modes-Hybrid 1 (0.76), k -Modes-Hybrid 2 (0.75), Fuzzy k -Modes (0.74), and k -Modes-UAVM (0.70). Conclusions The partitioning performance of the k -AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O ( km ( n-k )) and considered to be linear.
展开▼