Detection of Protein Coding Sequences Using a Mixture Model for Local Protein Amino Acid Sequence

Edward C. Thayer; Chris Bystroff; David Baker

首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >Detection of Protein Coding Sequences Using a Mixture Model for Local Protein Amino Acid Sequence

【24h】

Detection of Protein Coding Sequences Using a Mixture Model for Local Protein Amino Acid Sequence

机译：使用局部蛋白质氨基酸序列混合模型检测蛋白质编码序列

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Locating protein coding regions in genomic DNA is a critical step in accessing the information generated by large scale sequencing projects. Current methods for gene detection depend on statistical measures of content differences between coding and noncoding DNA in addition to the recognition of promoters, splice sites, and other regulatory sites. Here we explore the potential value of recurrent amino acid sequence patterns 3-19 amino acids in length as a content statistic for use in gene finding approaches. A finite mixture model incorporating these patterns can partially discriminate protein sequences which have no (detectable) known homologs from randomized versions of these sequences, and from short (≤50 amino acids ) non-coding segments extracted from the S. cerevisiea genome. The mixture model derived scores for a collection of human exons were not correlated with the GENSCAN scores, suggesting that the addition of our protein pattern recognition module to current gene recognition programs may improve their performance.

机译：在基因组DNA中定位蛋白质编码区是获取大规模测序项目所产生信息的关键步骤。除了识别启动子，剪接位点和其他调控位点外，目前用于基因检测的方法还取决于编码和非编码DNA之间含量差异的统计量度。在这里，我们探索长度为3-19个氨基酸的递归氨基酸序列模式的潜在价值，作为在基因发现方法中使用的含量统计数据。包含这些模式的有限混合模型可以从这些序列的随机版本以及从啤酒酵母基因组中提取的短（≤50个氨基酸）非编码片段中，部分区分没有（可检测）已知同源物的蛋白质序列。混合模型得出的人类外显子的分数与GENSCAN分数不相关，这表明将蛋白质模式识别模块添加到当前的基因识别程序中可能会改善其性能。

著录项

来源
《Journal of computational biology: A journal of computational molecular cell biology》 |2000年第2期|共11页
作者
Edward C. Thayer; Chris Bystroff; David Baker;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类普通生物学;
关键词
gene finding; mixture model; EM algorithm; sequence/structure motifs;

机译：基因发现;混合物模型;EM算法;序列/结构基序;

相似文献

外文文献
中文文献
专利

1. Detection of Protein Coding Sequences Using a Mixture Model for Local Protein Amino Acid Sequence [J] . Edward C. Thayer, Chris Bystroff, David Baker Journal of computational biology: A journal of computational molecular cell biology . 2000,第1a2期

机译：使用局部蛋白质氨基酸序列混合模型检测蛋白质编码序列
2. Statistical Comparison of Nucleotide, Amino Acid, and Codon Substitution Models for Evolutionary Analysis of Protein-Coding Sequences [J] . Tae-Kun Seo1* and Hirohisa Kishino2 Systematic Biology . 2009,第2期

机译：用于蛋白质编码序列进化分析的核苷酸，氨基酸和密码子替代模型的统计比较
3. Statistical Comparison of Nucleotide, Amino Acid, and Codon Substitution Models for Evolutionary Analysis of Protein-Coding Sequences [J] . Seo Tae-Kun, Kishino Hirohisa Systematic Biology . 2009,第2期

机译：用于蛋白质编码序列进化分析的核苷酸，氨基酸和密码子替代模型的统计比较
4. Detection of Protein-Protein Interactions from Amino Acid Sequences Using a Rotation Forest Model with a Novel PR-LPQ Descriptor [C] . Leon Wong, Zhu-Hong You, Shuai Li, International conference on advanced intelligent computing theories and applications . 2015

机译：使用具有新型PR-LPQ描述符的旋转森林模型从氨基酸序列检测蛋白质与蛋白质的相互作用
5. Evolutionary Analysis of the CAP Superfamily of Proteins using Amino Acid Sequences and Splice Sites [D] . Abraham, Anup 2016

机译：使用氨基酸序列和剪接位点的蛋白质CAP超家族的进化分析。
6. Amino- and carboxyl-terminal amino acid sequences of proteins coded by gag gene of murine leukemia virus [O] . Stephen Oroszlan, Louis E. Henderson, John R. Stephenson, 1978

机译：鼠白血病病毒gag基因编码蛋白的氨基末端和羧基末端氨基酸序列
7. Detection of protein coding sequences using a mixture model for local protein amino acid sequence [O] . Edward C. Thayer, Chris Bystroff, David Baker 2000

机译：使用本地蛋白质氨基酸序列的混合模型检测蛋白质编码序列

Detection of Protein Coding Sequences Using a Mixture Model for Local Protein Amino Acid Sequence

摘要

著录项

相似文献

相关主题

期刊订阅