首页> 美国卫生研究院文献>Nucleic Acids Research >SpliceDB: database of canonical and non-canonical mammalian splice sites
【2h】

SpliceDB: database of canonical and non-canonical mammalian splice sites

机译:SpliceDB:规范数据库 和非规范的哺乳动物剪接位点

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A database (SpliceDB) of known mammalian splice site sequences has been developed. We extracted 43 337 splice pairs from mammalian divisions of the gene-centered Infogene database, including sites from incomplete or alternatively spliced genes. Known EST sequences supported 22 815 of them. After discarding sequences with putative errors and ambiguous location of splice junctions the verified dataset includes 22 489 entries. Of these, 98.71% contain canonical GT–AG junctions (22 199 entries) and 0.56% have non-canonical GC–AG splice site pairs. The remainder (0.73%) occurs in a lot of small groups (with a maximum size of 0.05%). We especially studied non-canonical splice sites, which comprise 3.73% of GenBank annotated splice pairs. EST alignments allowed us to verify only the exonic part of splice sites. To check the conservative dinucleotides we compared sequences of human non-canonical splice sites with sequences from the high throughput genome sequencing project (HTG). Out of 171 human non-canonical and EST-supported splice pairs, 156 (91.23%) had a clear match in the human HTG. They can be classified after sequence analysis as: 79 GC–AG pairs (of which one was an error that corrected to GC–AG), 61 errors corrected to GT–AG canonical pairs, six AT–AC pairs (of which two were errors corrected to AT–AC), one case was produced from a non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two other cases left of supported non-canonical splice pairs. The information about verified splice site sequences for canonical and non-canonical sites is presented in SpliceDB with the supporting evidence. We also built weight matrices for the major splice groups, which can be incorporated into gene prediction programs. SpliceDB is available at the computational genomic Web server of the Sanger Centre: and at .
机译:已经开发了已知哺乳动物剪接位点序列的数据库(SpliceDB)。我们从以基因为中心的Infogene数据库的哺乳动物部门中提取了43337个剪接对,包括不完整或剪接的基因中的位点。已知的EST序列支持其中的22815个。在丢弃具有假定错误和拼接连接位置不明确的序列后,经过验证的数据集将包含22 489个条目。其中,98.71%包含规范的GT–AG连接(22 199个条目),0.56%包含非规范的GC–AG剪接位点对。其余(0.73%)出现在许多小组中(最大大小为0.05%)。我们特别研究了非规范的剪接位点,该位点占GenBank带注释的剪接对的3.73%。 EST的比对使我们只能验证剪接位点的外显子部分。为了检查保守的二核苷酸,我们将人类非规范剪接位点的序列与高通量基因组测序项目(HTG)的序列进行了比较。在171个非经典和EST支持的人类剪接对中,有156个(91.23%)在人类中具有明确的匹配 HTG。经过序列分析,它们可以分类为:79 GC–AG 对(其中一个是纠正为GC–AG的错误), 修正为GT–AG典型对,61个AT–AC对的61个错误 (其中两个是纠正为AT–AC的错误),一种情况 是从不存在的内含子产生的,发现了7例 在HTG中存入GenBank,最后只有 支持的非规范拼接对还剩下另外两个案例。的 有关规范和标准的已验证剪接位点序列的信息 非规范站点在SpliceDB中提供了支持 证据。我们还为主要的接头组建立了权重矩阵, 可以纳入基因预测程序。 SpliceDB 在Sanger的计算基因组Web服务器上可用 中心:和 在 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号