首页> 外文会议>IASTED International Conference on Biomedical Engineering >2-D THRESHOLDING OF THE CONNECTIVITY MAP FOLLOWING THE MULTIPLE SEQUENCE ALIGNMENTS OF DIVERSE DATASETS
【24h】

2-D THRESHOLDING OF THE CONNECTIVITY MAP FOLLOWING THE MULTIPLE SEQUENCE ALIGNMENTS OF DIVERSE DATASETS

机译:在多个数据集的多个序列对齐之后的连接映射的2-D阈值

获取原文

摘要

Multiple sequence alignment (MSA) is a widely used method to uncover the relationships between the biomolecular sequences. One essential prerequisite to apply this procedure is to have a considerable amount of similarity between the test sequences. It's usually not possible to obtain reliable results from the multiple alignments of large and diverse datasets. Here we propose a method to obtain sequence clusters of significant intragroup similarities and make sense out of the multiple alignments containing remote sequences. This is achieved by thresholding the pairwise connectivity map over 2 parameters. The first one is the inferred pairwise evolutionary distances and the second parameter is the number of gapless positions on the pairwise comparisons of the alignment. Threshold curves are generated regarding the statistical parameter values obtained from a shuffled dataset and probability distribution techniques are employed to select an optimum threshold curve that eliminate as much of the unreliable connectivities while keeping the reliable ones. We applied the method on a large and diverse dataset composed of nearly 18000 human proteins and measured the biological relevance of the recovered connectivities. Our precision measure (0.981) was nearly 20% higher than the one for the connectivities left after a classical thresholding procedure displaying a significant improvement. Finally we employed the method for the functional clustering of protein sequences in a gold standard dataset. We have also measured the performance, obtaining a higher F-measure (0.882) compared to a conventional clustering operation (0.827).
机译:多个序列对准(MSA)是广泛使用的方法,用于揭示生物分子序列之间的关系。应用此程序的一个基本先决条件是在测试序列之间具有相当大的相似性。通常不可能从大型和多样化数据集的多个对齐中获得可靠的结果。在这里,我们提出了一种方法来获得显着的内部内容相似度的序列簇,并从包含远程序列的多个对准中进行意义。这是通过在2个参数上阈值平衡的成对连接图来实现的。第一个是推断的成对进化距离,第二参数是对准对比较的比较比较上的无形位置的数量。关于从播放的数据集获得的统计参数值产生阈值曲线,并且采用概率分布技术来选择最佳阈值曲线,从而在保持可靠的时消除不可靠的连接性的最佳阈值曲线。我们将该方法应用于由近18000人蛋白质组成的大型和多样化数据集,并测量回收的连接性的生物学相关性。我们的精确度量(0.981)比显示显着改进的经典阈值手术后留下的连接性的距离高度高20%。最后,我们在金标准数据集中使用了蛋白质序列功能聚类的方法。与传统聚类操作相比,我们还测量了更高的F测量(0.882)(0.827)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号