An Algorithm to Find All Identical Motifs in Multiple Biological Sequences

机译：查找多个生物序列中所有相同基序的算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sequence motifs are of greater biological importance in nu-cleotide and protein sequences. The conserved occurrence of identical motifs represents the functional significance and helps to classify the biological sequences. In this paper, a new algorithm is proposed to find all identical motifs in multiple nucleotide or protein sequences. The proposed algorithm uses the concept of dynamic programming. The application of this algorithm includes the identification of (a) conserved identical sequence motifs and (b) identical or direct repeat sequence motifs across multiple biological sequences (nucleotide or protein sequences). Further, the proposed algorithm facilitates the analysis of comparative internal sequence repeats for the evolutionary studies which helps to derive the phylogenetic relationships from the distribution of repeats.

机译：序列基序在核苷酸和蛋白质序列中具有更大的生物学重要性。相同基序的保守出现代表功能意义，并有助于对生物学序列进行分类。在本文中，提出了一种新算法，可以在多个核苷酸或蛋白质序列中找到所有相同的基序。该算法采用了动态规划的概念。该算法的应用包括鉴定（a）保守的相同序列基序和（b）跨越多个生物序列（核苷酸或蛋白质序列）的相同或直接重复序列基序。此外，所提出的算法有利于对内部比较重复序列进行分析，以进行进化研究，从而有助于从重复序列的分布中得出系统发育关系。

著录项

来源
《Pattern recognition in bioinformatics》|2010年|p.137-148|共12页
会议地点 Nijmegen(NL);Nijmegen(NL)
作者
Ashish Kishor Bindal; R. Sabarinathan; J. Sridhar; D. Sherlin; K. Sekar;
展开▼
作者单位

Bioinformatics Centre (Centre of excellence in Structural Biology and Bio-computing), Indian Institute of Science, Bangalore 560012, India;

Bioinformatics Centre (Centre of excellence in Structural Biology and Bio-computing), Indian Institute of Science, Bangalore 560012, India;

Center of Excellence in Bioinformatics, School of Biotechnology, Madurai Kamaraj University, Madurai 625021, Tamilnadu, India;

Bioinformatics Centre (Centre of excellence in Structural Biology and Bio-computing), Indian Institute of Science, Bangalore 560012, India;

Bioinformatics Centre (Centre of excellence in Structural Biology and Bio-computing), Indian Institute of Science, Bangalore 560012, India;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类生物工程学（生物技术）;
关键词
Sequence motifs; nucleotide and protein sequences; identi-cal motifs; dynamic programming; direct repeat and phylogenetic relationships;

机译：序列基序；核苷酸和蛋白质序列；相同的图案动态编程直接重复和系统发育关系;
入库时间 2022-08-26 13:51:21

相似文献

外文文献
中文文献
专利

1. Efficient Automatic Exact Motif Discovery Algorithms For Biological Sequences [J] . Ali Karri Expert systems with applications . 2009,第4期

机译：高效的生物序列自动精确基元发现算法
2. An Algorithm to Solve the Motif Alignment Problem for Approximate Nested Tandem Repeats in Biological Sequences [J] . ATHEER A. MATROUD, CHRISTOPHER P. TUFFLEY, MICHAEL D. HENDY Journal of computational biology: A journal of computational molecular cell biology . 2011,第9期

机译：解决生物序列中近似嵌套串联重复的基序对齐问题的算法
3. An Algorithm to Solve the Motif Alignment Problem for Approximate Nested Tandem Repeats in Biological Sequences [J] . Atheer A. Matroud, Christopher P. Tuffley, Michael D. Hendy Journal of computational biology . 2011,第9期

机译：解决生物序列中近似嵌套串联重复的基序对齐问题的算法
4. An Algorithm to Find All Identical Motifs in Multiple Biological Sequences [C] . Ashish Kishor Bindal, R. Sabarinathan, J. Sridhar, International Conference on Pattern Recognition in Bioinformatics . 2010

机译：一种在多种生物序列中找到所有相同图案的算法
5. Heuristic algorithms to minimize total weighted tardiness on the single machine and identical parallel machines with sequence dependent setup and future ready time [D] . Xi, Yue 2013

机译：启发式算法，可最大程度地降低单机和相同并行机上的总加权拖尾率，并具有与序列相关的设置和未来准备时间
6. A novel algorithm for detecting multiple covariance and clustering of biological sequences [O] . Wei Shen, Yan Li -1

机译：一种检测生物序列的多重协方差和聚类的新算法
7. Figure 4: (A) One conserved sequence, which occurs 79 times in 46,264 binding site peaks from the ChIP-seq data-set. The mutation profile of this conserved sequence is illustrated, where ’_ ’ indicates this base is unchanged; DEL indicates this base is lost; INS X indicates a new base X is inserted in front of this base. (B) Several repeated elements patterns are listed. (C) In the first column, the top five DNA motifs, mined by meme-chip tools (Machanick Bailey, 2011) are illustrated. The resemblant conserved sequences, found by the CFSP algorithm are listed in the second column. In the third column, the position-specific scoring matrices, which are transformed from mutational information are listed. The similarity between meme motif and resemblant conserved sequence with PSSM format was calculated via a stamp motif comparison tool (Mahony Benos, 2007). The E-values for the similarity of those pairs is displayed in the fourth column. (D) One motif is selected in each group clustered by gkmsvm descriptors, and the corresponding motif found by the CFSP algorithm is listed below. (E) There are additional datasets (File No: ENCFF100GRL, ENCFF616IRT, ENCFF870CER, Target: SREBF1) collected from https://www.encodeproject.org. The top two motifs are selected in each file using meme tools, and the corresponding motifs found by our algorithm are listed below. [O] . -1

机译：图4：（a）一种保守序列，其发生在芯片-SEQ数据集中的46,264个结合位点峰值中的79倍。说明了这种保守序列的突变分布，其中'_'表示该碱度不变; del表示此基础丢失; INS X表示新的基础X插入此基础前面。（b）列出了几种重复的元素模式。（c）在第一栏中，示出了由MEME芯片工具（Machanick＆Bailey，2011）开采的前五个DNA主题。由CFSP算法发现的相应保守序列列于第二列中。在第三列中，列出了从突变信息转换的特定位置的评分矩阵。 MEME主题与PSSM格式的相似性与PSSM格式之间的相似性通过邮票图章比较工具（Mahony＆Benos，2007）计算。这些对相似性的电子值显示在第四列中。（d）在由GKMSVM描述符聚集的每个组中选择了一个图案，下面列出了CFSP算法的相应主题。（e）从https://www.encodeproject.org收集的，有附加数据集（文件no：cernff100grl，cenf616irl，conf8.20cer，target：srebf1）。使用MEME工具在每个文件中选择前两个图案，并且我们的算法发现的相应主题如下所示。

An Algorithm to Find All Identical Motifs in Multiple Biological Sequences

摘要

著录项

相似文献

相关主题

期刊订阅