“One code to find them all”: a perl tool to conveniently parse RepeatMasker output files

Marc Bailly-Bechet; Annabelle Haudry; Emmanuelle Lerat

首页> 外文期刊>Mobile DNA >“One code to find them all”: a perl tool to conveniently parse RepeatMasker output files

【24h】

“One code to find them all”: a perl tool to conveniently parse RepeatMasker output files

机译：“一个代码就能找到全部”：一种Perl工具，可方便地解析RepeatMasker输出文件

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes.

机译：背景技术在用于恢复基因组序列中转座因子（TE）的不同生物信息学方法中，最常用的方法之一是RepeatMasker程序提出的基于同源性的方法。 RepeatMasker生成几个输出文件，包括.out文件，该文件为查询序列中所有检测到的重复提供注释。然而，剩下的挑战包括识别与所识别出的命中对应的TE的不同拷贝。该步骤对于家族中不同拷贝的任何进化/比较分析都是必不可少的。不同的可能性可能导致与元素的唯一副本相对应的多个匹配，例如存在大的缺失/插入或碱基未确定，以及与单个全长序列相对应的不同共有序列（如长末端重复序列（LTR）-逆转录转座子）。必须考虑这些可能性才能确定TE副本的确切数量。结果我们开发了一个perl工具，该工具可以解析RepeatMasker .out文件，以便除了计算不同族的定量信息外，还可以更好地确定查询序列中TE副本的数量和位置。为了确定程序的准确性，我们在与两个有机体（果蝇（Drosophila melanogaster）和智人（Homo sapiens））相对应的几个RepeatMasker .out文件中对其进行了测试，这些文件的TE含量已被大量描述，并且在基因组大小，TE含量，和TE家庭。结论我们的工具可从RepeatMasker的.out文件中访问有关家族基因组中TE含量的详细信息。该信息包括每个副本的确切位置和方向，其在查询序列中的比例以及与参考元素相比的质量。此外，当将具有不完整TE类/子类信息的本地库与RepeatMasker一起使用时，我们的工具允许用户直接检索每个副本的序列并在家族级别获得相同的详细信息。我们希望该工具将对从事基因组内TE分布和进化的人们有所帮助。

著录项

来源
《Mobile DNA》 |2014年第1期|共页
作者
Marc Bailly-Bechet; Annabelle Haudry; Emmanuelle Lerat;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物化学;
关键词

相似文献

外文文献
中文文献
专利

1. BRM-Parser: A tool for comprehensive analysis of BLAST and RepeatMasker results [J] . Anjali Bajpai, Settu Sridhar, Hemakumar M. Reddy, In silico biology: An international on computational biology . 2007,第4a5期

机译：BRM-Parser：全面分析BLAST和RepeatMasker结果的工具
2. COD::CIF::Parser: an error-correcting CIF parser for the Perl language [J] . Merkys Andrius, Vaitkus Antanas, Butkus Justas, Journal of Applied Crystallography . 2016,第1期

机译：COD :: CIF :: Parser：Perl语言的错误纠正CIF解析器
3. Perl 6 On XCode: Bringing the power of Perl 6 to the XCode environment [J] . Jose R.C. Cruz MacTech magazine . 2007,第May期

机译：XCode上的Perl 6：将Perl 6的功能带入XCode环境
4. Perl programming input-output files [C] . Simsek Mustafa Application of Information and Communication Technologies, 2009. AICT 2009 . 2009

机译：Perl编程输入输出文件
5. Adaptive space-frequency coding for multiple-input and multiple-output orthogonal frequency division multiplexing systems (MIMOs, Trellis codes). [D] . Valkanas, Antonios D. 2004

机译：用于多输入和多输出正交频分复用系统（MIMO，Trellis码）的自适应空频编码。
6. One code to find them all: a perl tool to conveniently parse RepeatMasker output files [O] . Marc Bailly-Bechet, Annabelle Haudry, Emmanuelle Lerat 2014

机译：一个代码就能找到全部：一种Perl工具可方便地解析RepeatMasker输出文件
7. “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files [O] . Marc Bailly-Bechet, Annabelle Haudry, Emmanuelle Lerat 2014

机译：“一个代码就能找到全部”：一种Perl工具，可方便地解析RepeatMasker输出文件
8. Ftrcol: A Computer Code to Group Collapse Ftr Files and Output Standard Cccc Interface Files [R] . Soran, P. D. 1975

机译：Ftrcol：用于分组的计算机代码折叠Ftr文件和输出标准Cccc接口文件

“One code to find them all”: a perl tool to conveniently parse RepeatMasker output files

摘要

著录项

相似文献

相关主题

期刊订阅