首页> 外国专利> Annotation of genome sequences

Annotation of genome sequences

机译:基因组序列注释

摘要

A method of identifying one or more proteins in an unannotated DNA sequence is disclosed. The method involves dividing the DNA sequence into a plurality of sequence fragments of substantially the same length (about 300 to 5000 base pairs, most typically 1000 to 1050 base pairs. A six frame translation is then performed on each of the DNA sequence fragments to obtain six translated amino acid sequence fragments for each DNA sequence fragment. Each of the translated sequence fragments is subjected to theoretical digestion to obtain a plurality of cleaved peptide sequences. Next experimental empirical data for peptide fragments from a protein digested in the same manner as the theoretical digestion is compared with the theoretical data generated in step for each of the translated sequence fragments to identify one or more translated sequence fragments which include a substantial number of peptides present in the digested protein. The sequence fragment which has the greatest number of theoretical peptide masses correlating to the empirical data indicates the likely location of the protein of interest in the DNA sequence. To avoid problem where the sequence is divided at the site of a protein, the DNA sequence is duplicated and the original and duplicate are split in such a manner that the sequence fragments from the original overlap the cuts in the original genome sequence.
机译:公开了一种鉴定未注释的DNA序列中的一种或多种蛋白质的方法。该方法包括将DNA序列分为多个基本相同长度的序列片段(大约300至5000个碱基对,最典型的是1000至1050个碱基对,然后对每个DNA序列片段进行六帧翻译以获得每个DNA序列片段有六个翻译的氨基酸序列片段,每个翻译的序列片段都经过理论消化以获得多个裂解的肽序列,接下来的实验经验数据是从蛋白质中以与理论方法相同的方式消化的蛋白质的肽片段将消化与在步骤中为每个翻译序列片段产生的理论数据进行比较,以鉴定一个或多个翻译序列片段,其中包含消化的蛋白质中存在的大量肽。与经验数据相关联表明DNA序列中的目标蛋白质。为了避免在蛋白质的位点处序列被分割的问题,重复DNA序列,并以使得来自原始序列的片段与原始基因组序列中的切口重叠的方式来复制原始序列和复制序列。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号