...
首页> 外文期刊>Mobile DNA >A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank
【24h】

A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank

机译:一系列用于从GenBank收集和分析II组内含子逆转录序列的程序

获取原文
           

摘要

Background Accurate and complete identification of mobile elements is a challenging task in the current era of sequencing, given their large numbers and frequent truncations. Group II intron retroelements, which consist of a ribozyme and an intron-encoded protein (IEP), are usually identified in bacterial genomes through their IEP; however, the RNA component that defines the intron boundaries is often difficult to identify because of a lack of strong sequence conservation corresponding to the RNA structure. Compounding the problem of boundary definition is the fact that a majority of group II intron copies in bacteria are truncated. Results Here we present a pipeline of 11 programs that collect and analyze group II intron sequences from GenBank. The pipeline begins with a BLAST search of GenBank using a set of representative group II IEPs as queries. Subsequent steps download the corresponding genomic sequences and flanks, filter out non-group II introns, assign introns to phylogenetic subclasses, filter out incomplete and/or non-functional introns, and assign IEP sequences and RNA boundaries to the full-length introns. In the final step, the redundancy in the data set is reduced by grouping introns into sets of ≥95% identity, with one example sequence chosen to be the representative. Conclusions These programs should be useful for comprehensive identification of group II introns in sequence databases as data continue to rapidly accumulate.
机译:背景技术鉴于目前的测序时代,鉴于其数量众多且经常被截断,准确而完整地鉴定移动元素是一项艰巨的任务。通常在细菌基因组中通过IEP鉴定出由核酶和内含子编码蛋白(IEP)组成的II组内含子逆转录元件。然而,由于缺乏与RNA结构相对应的强序列保守性,限定内含子边界的RNA成分通常难以鉴定。边界定义问题更加复杂的事实是细菌中大多数II组内含子拷贝被截断。结果在这里,我们介绍了11个程序的管道,这些程序从GenBank收集并分析了II组内含子序列。该管道以GenBank的BLAST搜索开始,使用一组代表性的II类IEP作为查询。后续步骤将下载相应的基因组序列和侧翼,过滤掉非II类内含子,将内含子分配到系统发生亚类,过滤出不完整和/或无功能的内含子,并将IEP序列和RNA边界分配给全长内含子。在最后一步中,通过将内含子分为≥95%同一性的集合来减少数据集中的冗余,并选择一个示例序列作为代表。结论随着数据的不断积累,这些程序对于序列数据库中II组内含子的全面鉴定很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号