首页> 外文期刊>Journal of cellular biochemistry. >Benchmarking Database Performance for Genomic Data
【24h】

Benchmarking Database Performance for Genomic Data

机译:评估基因组数据的数据库性能

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlappingon-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of > 1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1). (C) 2015 Wiley Periodicals, Inc.
机译:基因组区域代表诸如基因注释,转录因子结合位点和表观遗传修饰等特征。进行各种基因组操作,例如识别重叠/非重叠区域或最近的基因注释,是普遍的研究需求。可以将数据保存在数据库系统中以便于管理,但是,目前尚没有用于识别重叠区域的全面的内置数据库算法。因此,我开发了一种基于SQL的新颖的区域映射(RegMap)算法,以执行基因组操作并确定了不同数据库的性能。基准测试表明,PostgreSQL提取重叠区域的速度比MySQL快得多。 PostgreSQL中的插入和数据上传也更好,尽管两个数据库的常规搜索功能几乎相同。另外,使用成对算法,报道了从以前的出版物中收集的> 1000个转录因子结合位点和组蛋白标记的数据集的重叠,并且发现HNF4G与黏着蛋白亚基STAG1(SA1)显着共位。 (C)2015威利期刊公司

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号