首页> 外文学位 >Statistical methods for serial analysis of gene expression.
【24h】

Statistical methods for serial analysis of gene expression.

机译:基因表达系列分析的统计方法。

获取原文
获取原文并翻译 | 示例

摘要

Serial analysis of gene expression (SAGE) is a technique for obtaining information about gene expression. SAGE experiments provide insight into human disease by identifying disease-related genes and by suggesting possible therapeutic targets. Data from a SAGE experiment consist of long lists of gene identifiers (tags) and corresponding frequencies. A dominant proportion of these lists consist of tags which appear only a few times. Some of the low frequency tags represent low frequency mRNAs, but some are the result of sequencing errors. It is difficult to distinguish between these two cases. This thesis presents methods for enhancing the signal from infrequently occurring tags.; The frequency distributions of tags display a remarkable regularity across cell types and species. The first technique exploits this regularity to automatically discount low counts that cannot reliably be used for comparison of expression levels across conditions for a specific gene and to transform the cell counts to a scale that produces more reliable correlation and clustering of genome-wide expression profiles.; The second contribution is a method for calculation of the error rate in any library. We observe a linear relationship between the copy number for a given tag and the number of tags observed that differ from the tag of interest by a single-base substitution, insertion, or deletion. We have found that the slope of this relationship may be transformed to give an estimate of the error rate.; Finally, we identify the likely erroneously generated tags. We develop a model for reassigning these erroneously read tags by identifying probable errors and the corresponding tags that spawned them. An error in one base pair of a very common tag may result in the observation of a completely new, but similar, tag—a shadow of the common tag. Infrequently observed tags that are very similar to other observed tags may have been created by such a process. On the other hand, infrequently observed tags that are not similar to any other observed tags may represent genuinely infrequently expressed transcripts. The proposed method reassigns the erroneously observed shadows to the tags that may have generated the shadow.
机译:基因表达的序列分析(SAGE)是一种获取有关基因表达信息的技术。 SAGE实验通过鉴定与疾病相关的基因并提出可能的治疗靶点,从而洞悉人类疾病。来自SAGE实验的数据包括一长串的基因标识符(标签)和相应的频率。这些列表的主要部分由仅出现几次的标签组成。某些低频标签代表低频mRNA,但有些是测序错误的结果。很难区分这两种情况。本文提出了增强不频繁出现的标签信号的方法。标签的频率分布在细胞类型和物种之间显示出显着的规律性。第一种技术利用这种规律性来自动减少无法可靠地用于特定基因条件之间比较表达水平的低计数,并将细胞计数转换为可产生更可靠相关性和全基因组表达谱聚类的规模。 ;第二个贡献是一种用于计算任何库中错误率的方法。我们观察到给定标签的拷贝数与观察到的标签数量之间的线性关系,这些标签与感兴趣的标签之间存在单碱基替换,插入或删除的区别。我们已经发现,这种关系的斜率可以被变换以给出误差率的估计。最后,我们确定可能错误生成的标签。我们开发了一个模型,用于通过识别可能的错误和产生它们的相应标签来重新分配这些错误读取的标签。一个非常常见的标签的一个碱基对中的错误可能会导致观察到一个全新但相似的标签-常见标签的阴影。与其他观察到的标签非常相似的不经常观察到的标签可能已通过此过程创建。另一方面,与其他任何观察到的标签都不相似的不经常观察到的标签可能代表了真正不频繁表达的转录本。所提出的方法将错误观察到的阴影重新分配给可能已经生成阴影的标签。

著录项

  • 作者

    Blades, Natalie Jean.;

  • 作者单位

    The Johns Hopkins University.;

  • 授予单位 The Johns Hopkins University.;
  • 学科 Biology Biostatistics.; Biology Molecular.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 159 p.
  • 总页数 159
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物数学方法;分子遗传学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号