首页> 外文期刊>BMC Bioinformatics >SamQL: a structured query language and filtering tool for the SAM/BAM file format
【24h】

SamQL: a structured query language and filtering tool for the SAM/BAM file format

机译:SAMQL:SAM / BAM文件格式的结构化查询语言和过滤工具

获取原文
           

摘要

The Sequence Alignment/Map Format Specification (SAM) is one of the most widely adopted file formats in bioinformatics and many researchers use it daily. Several tools, including most high-throughput sequencing read aligners, use it as their primary output and many more tools have been developed to process it. However, despite its flexibility, SAM encoded files can often be difficult to query and understand even for experienced bioinformaticians. As genomic data are rapidly growing, structured, and efficient queries on data that are encoded in SAM/BAM files are becoming increasingly important. Existing tools are very limited in their query capabilities or are not efficient. Critically, new tools that address these shortcomings, should not be able to support existing large datasets but should also do so without requiring massive data transformations and file infrastructure reorganizations. Here we introduce SamQL, an SQL-like query language for the SAM format with intuitive syntax that supports complex and efficient queries on top of SAM/BAM files and that can replace commonly used Bash one-liners employed by many bioinformaticians. SamQL has high expressive power with no upper limit on query size and when parallelized, outperforms other substantially less expressive software. SamQL is a complete query language that we envision as a step to a structured database engine for genomics. SamQL is written in Go, and is freely available as standalone program and as an open-source library under an MIT license, https://github.com/maragkakislab/samql/ .
机译:序列对齐/地图格式规范(SAM)是生物信息学中最广泛采用的文件格式之一,并且许多研究人员每天使用它。几种工具,包括大多数高吞吐量序列读取对齐器,使用它作为其主要输出,并且已经开发了许多工具来处理它。但是,尽管它的灵活性,但是SAM编码的文件通常甚至可能难以查询和理解,即使是经验丰富的生物信息管理员。由于基因组数据正在快速增长,结构化和有效查询在SAM / BAM文件中编码的数据变得越来越重要。现有工具的查询功能非常有限,或者不高效。批判性地,解决这些缺点的新工具,不应该支持现有的大型数据集,但也应该这样做,而不需要大规模的数据转换和文件基础架构重组。在这里,我们介绍SAMQL,类似于SAM格式的SAM样式,具有直观的语法,支持SAM / BAM文件顶部的复杂和有效的查询,并且可以替换许多生物信息管理员使用的常用Bash单行。 SAMQL具有高富有效力功率,在查询大小上没有上限,并且在并行化时,优于其他显着表现力的软件。 SAMQL是一种完整的查询语言,我们认为为基因组学的结构化数据库引擎的步骤。 SAMQL是通过Go的编写的,并作为独立程序自由提供,作为MIT许可证的开源库,https://github.com/maragkakislab/samql/。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号