The ability to issue sequence-level searches over publicly available databases of assembled genomes and known proteins has played an instrumental role in many studies in the field of genomics, and has made BLAST [2] and its variants some of the most widely-used tools in all of science. However, until recently, tools for searches over genomic data were restricted to reference sequences. As a result, the vast majority of publicly-available sequencing data (e.g., the data deposited in the SRA [3]) has been difficult to search because it exists in the form of raw, unassembled sequencing reads.
展开▼