...
首页> 外文期刊>BMC Bioinformatics >DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis
【24h】

DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis

机译:DupChecker:用于在荟萃分析中检查高通量基因组数据冗余的生物导体包装

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background Meta-analysis has become a popular approach for high-throughput genomic data analysis because it often can significantly increase power to detect biological signals or patterns in datasets. However, when using public-available databases for meta-analysis, duplication of samples is an often encountered problem, especially for gene expression data. Not removing duplicates could lead false positive finding, misleading clustering pattern or model over-fitting issue, etc in the subsequent data analysis. Results We developed a Bioconductor package Dupchecker that efficiently identifies duplicated samples by generating MD5 fingerprints for raw data. A real data example was demonstrated to show the usage and output of the package. Conclusions Researchers may not pay enough attention to checking and removing duplicated samples, and then data contamination could make the results or conclusions from meta-analysis questionable. We suggest applying DupChecker to examine all gene expression data sets before any data analysis step.
机译:背景荟萃分析已成为高通量基因组数据分析的一种流行方法,因为它通常可以大大提高检测数据集中生物信号或模式的能力。但是,当使用公共数据库进行荟萃分析时,样本重复是一个经常遇到的问题,尤其是对于基因表达数据而言。不删除重复项可能会在后续数据分析中导致错误的肯定发现,误导的聚类模式或模型过拟合问题等。结果我们开发了Bioconductor软件包Dupchecker,可通过为原始数据生成MD5指纹来有效地识别重复样品。演示了一个真实的数据示例,以显示软件包的用法和输出。结论研究人员可能没有足够的精力检查和删除重复的样本,然后数据污染可能使荟萃分析的结果或结论令人怀疑。我们建议在任何数据分析步骤之前应用DupChecker检查所有基因表达数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号