首页> 外文期刊>Nucleic Acids Research >Analysis of canonical and non-canonical splice sites in mammalian genomes
【24h】

Analysis of canonical and non-canonical splice sites in mammalian genomes

机译:哺乳动物基因组中典型和非典型剪接位点的分析

获取原文
获取原文并翻译 | 示例
           

摘要

A set of 43 337 splice junction pairs was extracted from mammalian GenBank annotated genes. Expressed sequence tag (EST) sequences support 22 489 of them. Of these, 98.71% contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively; 0.56% hold non-canonical GC-AG splice site pairs; and the remaining 0.73% occurs in a lot of small groups (with a maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only eight observed types of splice site pairs (out of 256 a priori possible combinations). EST alignments allow us to verify the exonic part of the splice sites, but many non-canonical cases may be due to intron sequencing errors. This idea is given substantial support when we compare the sequences of human genes having non-canonical splice sites deposited in GenBank by high throughput genome sequencing projects (HTG). A high proportion (156 out of 171) of the human non-canonical and EST-supported splice site sequences had a clear match in the human HTG. They can be classified after corrections as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors that were corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors that corrected to AT-AC), one case was produced from non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two cases left of supported non-canonical splice sites. If we assume that approximately the same situation is true for the whole set of annotated mammalian non-canonical splice sites, then the 99.24% of splice site pairs should be GT-AG, 0.69% GC-AG, 0.05% AT-AC and finally only 0.02% could consist of other types of non-canonical splice sites. We analyze several characteristics of EST-verified splice sites and build weight matrices for the major groups, which can be incorporated into gene prediction programs. We also present a set of EST-verified canonical splice sites larger by two orders of magnitude than the current one (22 199 entries versus ~ 600) and finally, a set of 290 EST-supported non-canonical splice sites. Both sets should be significant for future investigations of the splicing mechanism.
机译:从哺乳动物的GenBank注释基因中提取了一组43 337个剪接点对。表达的序列标签(EST)序列支持其中的22489个。其中98.71%分别含有供体和受体位点的经典二核苷酸GT和AG; 0.56%持有非规范的GC-AG剪接位点对;剩下的0.73%发生在很多小组中(最大规模为0.05%)。在研究这些基团时,我们观察到其中许多基团包含剪接二核苷酸,这些双核苷酸从带注释的剪接点移了一个位置。在仔细检查了这种情况之后,我们提出了一个新的分类,该分类仅包含八种观察到的剪接位点对类型(在256种可能的先验组合中)。 EST比对使我们能够验证剪接位点的外显子部分,但许多非典型情况可能是由于内含子测序错误所致。当我们比较通过高通量基因组测序项目(HTG)在GenBank中保存的具有非规范剪接位点的人类基因的序列时,该想法将获得实质性支持。人类非规范和EST支持的剪接位点序列中有很高比例(共171个中的156个)在人类HTG中具有明显的匹配。经过更正后,可以将它们分类为:79对GC-AG对(其中一个是对GC-AG进行了校正的错误),61对对GT-AG典型对进行了校正的错误,六对AT-AC对(其中两对是纠正到AT-AC的错误),其中一例是从不存在的内含子中产生的,在HTG中发现了七例,这些基因已存入GenBank,最后仅剩下两例支持的非规范剪接位点。如果我们假设整个带注释的哺乳动物非规范剪接位点的情况大致相同,那么99.24%的剪接位点对应该是GT-AG,0.69%GC-AG,0.05%AT-AC,最后只有0.02%可以包含其他类型的非规范剪接位点。我们分析了EST验证的剪接位点的几个特征,并为主要群体建立了权重矩阵,这些矩阵可以纳入基因预测程序中。我们还提出了一组经过EST验证的规范剪接位点,比当前的剪接位点大了两个数量级(22 199个条目对〜600个),最后,给出了一组290个EST支持的非规范剪接位点。这两套对将来的剪接机理研究都将具有重要意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号