...
首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >Detection of Protein Coding Sequences Using a Mixture Model for Local Protein Amino Acid Sequence
【24h】

Detection of Protein Coding Sequences Using a Mixture Model for Local Protein Amino Acid Sequence

机译:使用局部蛋白质氨基酸序列混合模型检测蛋白质编码序列

获取原文
获取原文并翻译 | 示例
           

摘要

Locating protein coding regions in genomic DNA is a critical step in accessing the information generated by large scale sequencing projects. Current methods for gene detection depend on statistical measures of content differences between coding and noncoding DNA in addition to the recognition of promoters, splice sites, and other regulatory sites. Here we explore the potential value of recurrent amino acid sequence patterns 3-19 amino acids in length as a content statistic for use in gene finding approaches. A finite mixture model incorporating these patterns can partially discriminate protein sequences which have no (detectable) known homologs from randomized versions of these sequences, and from short (≤50 amino acids ) non-coding segments extracted from the S. cerevisiea genome. The mixture model derived scores for a collection of human exons were not correlated with the GENSCAN scores, suggesting that the addition of our protein pattern recognition module to current gene recognition programs may improve their performance.
机译:在基因组DNA中定位蛋白质编码区是获取大规模测序项目所产生信息的关键步骤。除了识别启动子,剪接位点和其他调控位点外,目前​​用于基因检测的方法还取决于编码和非编码DNA之间含量差异的统计量度。在这里,我们探索长度为3-19个氨基酸的递归氨基酸序列模式的潜在价值,作为在基因发现方法中使用的含量统计数据。包含这些模式的有限混合模型可以从这些序列的随机版本以及从啤酒酵母基因组中提取的短(≤50个氨基酸)非编码片段中,部分区分没有(可检测)已知同源物的蛋白质序列。混合模型得出的人类外显子的分数与GENSCAN分数不相关,这表明将蛋白质模式识别模块添加到当前的基因识别程序中可能会改善其性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号