首页> 外文期刊>Cancer Informatics >Improving Cancer Gene Expression Data Quality through a TCGA Data-Driven Evaluation of Identifier Filtering
【24h】

Improving Cancer Gene Expression Data Quality through a TCGA Data-Driven Evaluation of Identifier Filtering

机译:通过TCGA数据驱动的标识符过滤评估提高癌症基因表达数据的质量

获取原文
           

摘要

Data quality is a recognized problem for high-throughput genomics platforms, as evinced by the proliferation of methods attempting to filter out lower quality data points. Different filtering methods lead to discordant results, raising the question, which methods are best? Astonishingly, little computational support is offered to analysts to decide which filtering methods are optimal for the research question at hand. To evaluate them, we begin with a pair of expression data sets, transcriptomic and proteomic, on the same samples. The pair of data sets form a test-bed for the evaluation. Identifier mapping between the data sets creates a collection of feature pairs, with correlations calculated for each pair. To evaluate a filtering strategy, we estimate posterior probabilities for the correctness of probesets accepted by the method. An analyst can set expected utilities that represent the trade-off between the quality and quantity of accepted features. We tested nine published probeset filtering methods and combination strategies. We used two test-beds from cancer studies providing transcriptomic and proteomic data. For reasonable utility settings, the Jetset filtering method was optimal for probeset filtering on both test-beds, even though both assay platforms were different. Further intersection with a second filtering method was indicated on one test-bed but not the other.
机译:数据质量是高通量基因组学平台公认的问题,这证明了试图过滤出质量较低的数据点的方法的大量涌现。不同的过滤方法会导致结果不一致,这引出了一个问题,哪种方法最好?令人惊讶的是,很少有计算支持可提供给分析人员,以决定哪种过滤方法最适合当前的研究问题。为了评估它们,我们从相同样本上的一对表达数据集(转录组和蛋白质组)开始。这对数据集构成了评估的测试平台。数据集之间的标识符映射会创建一组特征对,并为每个特征对计算相关性。为了评估过滤策略,我们估计了该方法接受的探针集正确性的后验概率。分析人员可以设置期望的效用,以表示接受功能的质量和数量之间的折衷。我们测试了九种公开的探针集过滤方法和组合策略。我们使用了来自癌症研究的两个试验台,提供了转录组学和蛋白质组学数据。对于合理的实用程序设置,即使两个分析平台都不同,Jetset过滤方法对于两个测试台上的探针集过滤也是最佳的。在一个试验台上指示了与第二种过滤方法的进一步交叉,但在另一个试验台上却没有。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号