首页> 美国卫生研究院文献>Springer Open Choice >Identification of sample annotation errors in gene expression datasets
【2h】

Identification of sample annotation errors in gene expression datasets

机译:鉴定基因表达数据集中的样品注释错误

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

The comprehensive transcriptomic analysis of clinically annotated human tissue has found widespread use in oncology, cell biology, immunology, and toxicology. In cancer research, microarray-based gene expression profiling has successfully been applied to subclassify disease entities, predict therapy response, and identify cellular mechanisms. Public accessibility of raw data, together with corresponding information on clinicopathological parameters, offers the opportunity to reuse previously analyzed data and to gain statistical power by combining multiple datasets. However, results and conclusions obviously depend on the reliability of the available information. Here, we propose gene expression-based methods for identifying sample misannotations in public transcriptomic datasets. Sample mix-up can be detected by a classifier that differentiates between samples from male and female patients. Correlation analysis identifies multiple measurements of material from the same sample. The analysis of 45 datasets (including 4913 patients) revealed that erroneous sample annotation, affecting 40 % of the analyzed datasets, may be a more widespread phenomenon than previously thought. Removal of erroneously labelled samples may influence the results of the statistical evaluation in some datasets. Our methods may help to identify individual datasets that contain numerous discrepancies and could be routinely included into the statistical analysis of clinical gene expression data.Electronic supplementary materialThe online version of this article (doi:10.1007/s00204-015-1632-4) contains supplementary material, which is available to authorized users.
机译:临床注释人类组织的全面转录组分析已广泛应用于肿瘤学,细胞生物学,免疫学和毒理学。在癌症研究中,基于微阵列的基因表达谱已成功地用于对疾病实体进行亚分类,预测治疗反应并确定细胞机制。原始数据的公共可访问性,以及有关临床病理参数的相应信息,提供了重用先前分析的数据并通过组合多个数据集获得统计能力的机会。但是,结果和结论显然取决于可用信息的可靠性。在这里,我们提出了基于基因表达的方法来识别公共转录组数据集中的样本错误注释。可以通过分类器检测样品混合情况,该分类器可以区分男性患者和女性患者的样品。相关分析可识别同一样品中多种材料的测量值。对45个数据集(包括4913名患者)的分析表明,错误的样本注释影响了40%的分析数据集,可能是一种比以前认为的更为普遍的现象。错误标记的样本的删除可能会影响某些数据集中的统计评估结果。我们的方法可能有助于识别包含大量差异的单个数据集,并且可以常规地纳入临床基因表达数据的统计分析中。电子补充材料本文的在线版本(doi:10.1007 / s00204-015-1632-4)包含补充信息资料,可供授权用户使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号