首页>
外国专利>
METHOD AND SYSTEMS FOR RESTORING GENOMIC REFERENCE SEQUENCES FROM COMPRESSED READINGS OF A GENOMIC SEQUENCE
METHOD AND SYSTEMS FOR RESTORING GENOMIC REFERENCE SEQUENCES FROM COMPRESSED READINGS OF A GENOMIC SEQUENCE
展开▼
机译:从基因组序列的压缩读数恢复基因组参考序列的方法和系统
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method and apparatus that includes representing the reference genome by means of syntactic elements describing the differences between the reference genome and aligned genomic sequences. Genomic sequences are pre-aligned with the reference genome. Each aligned genomic sequence is described using a subset of syntactic elements. The syntactic elements describing all genomic sequences are divided into blocks in accordance with their statistical properties. Each block of syntactic elements is entropy encoded. Then the entropy encoded blocks are concatenated to form a compressed binary data stream. Differences between the reference genome and aligned sequences are expressed through syntactic elements. Syntactic elements are divided into blocks in accordance with their statistical properties and each block is entropy encoded. Then, entropy-encoded syntax elements are embedded in a binary data stream of encoded blocks of syntax elements describing aligned reads. The proposed method allows you to restore the reference genome used for alignment when decoding compressed genomic sequences, while maintaining various possibilities for random access to compressed data and providing effective compression.
展开▼