首页> 美国卫生研究院文献>Genome Research >A Comprehensive Approach to Clustering of Expressed Human Gene Sequence: The Sequence Tag Alignment and Consensus Knowledge Base
【2h】

A Comprehensive Approach to Clustering of Expressed Human Gene Sequence: The Sequence Tag Alignment and Consensus Knowledge Base

机译:表达的人类基因序列聚类的综合方法:序列标签比对和共识知识库

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The expressed human genome is being sequenced and analyzed by disparate groups producing disparate data. The majority of the identified coding portion is in the form of expressed sequence tags (ESTs). The need to discover exonic representation and expression forms of full-length cDNAs for each human gene is frustrated by the partial and variable quality nature of this data delivery. A highly redundant human EST data set has been processed into integrated and unified expressed transcript indices that consist of hierarchically organized human transcript consensi reflecting gene expression forms and genetic polymorphism within an index class. The expression index and its intermediate outputs include cleaned transcript sequence, expression, and alignment information and a higher fidelity subset, SANIGENE. The STACK_PACK clustering system has been applied to dbEST release 121598 (GenBank version 110). Sixty-four percent of 1,313,103 Homo sapiens ESTs are condensed into 143,885 tissue level multiple sequence clusters; linking through clone-ID annotations produces 68,701 total assemblies, such that 81% of the original input set is captured in a STACK multiple sequence or linked cluster. Indexing of alignments by substituent EST accession allows browsing of the data structure and its cross-links to UniGene. STACK metaclusters consolidate a greater number of ESTs by a factor of 1.86 with respect to the corresponding UniGene build. Fidelity comparison with genome reference sequence demonstrates consensus expression clusters that reflect significantly lower spurious repeat sequence content and capture alternate splicing within a whole body index cluster and three STACK v.2.3 tissue-level clusters. Statistics of a staggered release whole body index build of STACK v.2.0 are presented.
机译:表达的人类基因组正在测序,并由产生不同数据的不同群体进行分析。所识别的编码部分的大部分以表达的序列标签(EST)的形式存在。发现每个人类基因的全长cDNA的外显子表示和表达形式的需求因这种数据传递的部分和可变质量性质而感到沮丧。高度冗余的人类EST数据集已处理成完整且统一的表达转录本索引,该索引由层次结构化的人类转录本共识组成,反映了索引类中的基因表达形式和遗传多态性。表达索引及其中间输出包括干净的转录本序列,表达和比对信息,以及更高保真度的子集SANIGENE。 STACK_PACK集群系统已应用于dbEST版本121598(GenBank版本110)。 1,313,103个智者EST中有64%被浓缩成143,885个组织水平的多序列簇。通过克隆ID注释进行链接可生成68,701个程序集,这样原始输入集的81%被捕获在STACK多序列或链接簇中。通过取代基EST加入进行比对索引,可以浏览数据结构及其与UniGene的交叉链接。相对于相应的UniGene构建,STACK元簇将更多的EST整合了1.86倍。与基因组参考序列的保真度比较表明,共有表达簇反映出明显更低的伪重复序列含量,并捕获了全身索引簇和三个STACK v.2.3组织水平簇内的交替剪接。给出了STACK v.2.0的交错发布全身索引构建的统计信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号