首页> 美国卫生研究院文献>PLoS Neglected Tropical Diseases >Defining objective clusters for rabies virus sequences using affinity propagation clustering
【2h】

Defining objective clusters for rabies virus sequences using affinity propagation clustering

机译:使用亲和力传播聚类为狂犬病病毒序列定义目标簇

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses.
机译:狂犬病是由狂犬病病毒引起的,是已知的最古老的人畜共患病之一。近年来,来自狂犬病狂犬病病毒原型物种的狂犬病病毒(RABV)的21,000多个核苷酸序列已保存在公共数据库中。随后的系统发育分析与元数据相结合,表明了RABV的地理分布。但是,这些分析在确定系统发生树中群集分配的可验证标准时遇到了技术难题,因此需要采用更合理的方法。因此,我们应用了一种相对较新的数学聚类算法,称为“亲和力传播聚类”(AP),以利用全基因组RABV序列提出标准化的亚种分类。由于AP的优点是计算速度快,并且可以有效地测量数据样本之间的相似度,因此先前已成功地将其应用于生物信息学中,用于分析微阵列和基因表达数据,但是,序列的聚类分析仍在婴儿期现有(516)和原始(46)全基因组RABV序列用于证明AP在RABV聚类中的应用。在全球范围内,AP提出了四个聚类,即新世界聚类,北极/类北极,世界性和亚洲聚类,这是系统发育研究先前分配的。通过将AP与已建立的系统发育分析相结合,可以解决可验证确定的簇和序列之间的系统发育关系。该工作流程不仅可以用于RABV,而且可以用于其他比较序列分析,以统一,透明的方式确认聚类分布。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号