...
首页> 外文期刊>Journal of Bioinformatics and Sequence Analysis >UniDPlot: A software to detect weak similarities between two DNA sequences
【24h】

UniDPlot: A software to detect weak similarities between two DNA sequences

机译:UniDPlot:一种用于检测两个DNA序列之间弱相似性的软件

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Search for DNA sequence similarity is a crucial step in many evolutionary analyses and several bioinformatic tools are available to fulfill this task. Basic local alignment search tool (BLAST) is the most commonly and highly efficient algorithm used. However, it often fails in identifying sequences showing very weak similarity. An alternative method is to use Dot Plot, but such a graphical method is not suitable for the analysis of large sequences (e.g. hundreds of kilobases) as this is now more often required in the context of genome sequencing programs. As an alternative to the classical Dot Plot method, we designed UniDPlot, which permits to search for weak similarity either between two large sequences (e.g., genome regions, ...) or between one large sequence and a short one (e.g., exons, …). UniDPlot methodology contracts the output of the Dot Plot similarity matrix along the length of the largest sequence, while defining statistical limits of significance using a bootstrap procedure. To illustrate the efficiency of this method, we used UniDPlot to search for the fate of the gene that encodes the major enamel protein, amelogenin, in chicken. Although we showed that amelogenin was invalidated through a pseudogeneization process, we recovered the entire sequence in the chicken genome. Using UniDPlot, we have identified a pseudogene, which was not detected by classical methods. UniDPlot can be used to search for missing genes, or motifs of various sizes in different genomic contexts.
机译:寻找DNA序列相似性是许多进化分析中的关键步骤,并且有几种生物信息学工具可用来完成此任务。基本的局部比对搜索工具(BLAST)是最常用和最高效的算法。然而,它常常不能鉴定出显示出非常弱的相似性的序列。一种替代方法是使用点图,但是这种图形方法不适用于分析大序列(例如几百个千碱基),因为在基因组测序程序的背景下现在更需要这种方法。作为经典点图方法的替代方法,我们设计了UniDPlot,该方法可以搜索两个大序列(例如,基因组区域等)之间或一个大序列与一个短序列(例如外显子, …)。 UniDPlot方法沿最大序列的长度收缩点图相似性矩阵的输出,同时使用自举程序定义重要性的统计极限。为了说明此方法的有效性,我们使用UniDPlot搜索编码鸡中主要釉质蛋白amelogenin的基因的命运。尽管我们证明釉质生成素通过假基因生成过程无效,但我们在鸡基因组中恢复了整个序列。使用UniDPlot,我们已经确定了一个伪基因,经典方法无法检测到该伪基因。 UniDPlot可用于在不同的基因组环境中搜索缺失的基因或各种大小的基序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号