Index suffix-prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression

Motivation Advanced high-throughput sequencing technologies have produced massive amount of reads data, and algorithms have been specially designed to contract the size of these datasets for efficient storage and transmission. Reordering reads with regard to their positions in de novo assembled contigs or in explicit reference sequences has been proven to be one of the most effective reads compression approach. As there is usually no good prior knowledge about the reference sequence, current focus is on the novel construction of de novo assembled contigs.
机译:动机先进的高吞吐量排序技术已经产生了大量的读取数据,并且专门设计用于收缩这些数据集的大小以实现高效的存储和传输。 在De Novo组装的Contig或显式参考序列中重新排序读取已被证明是最有效的读取压缩方法之一。 由于通常没有良好的关于参考序列的知识,因此目前的重点是关于De Novo组装成簇的新颖建设。


    Univ Technol Sydney Fac Engn &

    IT Adv Analyt Inst Ultimo 2007 Australia;

    Xiangtan Univ Key Lab Intelligent Comp &

    Informat Proc Hunan Key Lab Computat &

    Simulat Sci &

    Engn Minist Educ Xiangtan 411105 Hunan Peoples R China;

    Garvan Inst Med Res Kinghorn Ctr Clin Genom Sydney NSW 2010 Australia;

    Univ Technol Sydney Fac Engn &

    IT Adv Analyt Inst Ultimo 2007 Australia;

