...
首页> 外文期刊>Scientific reports. >A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
【24h】

A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns

机译:无比对的宏基因组合并的信号处理方法:多分辨率基因组二进制模式

获取原文
           

摘要

Algorithms in bioinformatics use textual representations of genetic information, sequences of the characters A, T, G and C represented computationally as strings or sub-strings. Signal and related image processing methods offer a rich source of alternative descriptors as they are designed to work in the presence of noisy data without the need for exact matching. Here we introduce a method, multi-resolution local binary patterns (MLBP) adapted from image processing to extract local ‘texture’ changes from nucleotide sequence data. We apply this feature space to the alignment-free binning of metagenomic data. The effectiveness of MLBP is demonstrated using both simulated and real human gut microbial communities. Sequence reads or contigs can be represented as vectors and their ‘texture’ compared efficiently using machine learning algorithms to perform dimensionality reduction to capture eigengenome information and perform clustering (here using randomized singular value decomposition and BH-tSNE). The intuition behind our method is the MLBP feature vectors permit sequence comparisons without the need for explicit pairwise matching. We demonstrate this approach outperforms existing methods based on k-mer frequencies. The signal processing method, MLBP, thus offers a viable alternative feature space to textual representations of sequence data. The source code for our Multi-resolution Genomic Binary Patterns method can be found at https://github.com/skouchaki/MrGBP .
机译:生物信息学中的算法使用遗传信息的文本表示形式,字符A,T,G和C的序列以计算方式表示为字符串或子字符串。信号和相关的图像处理方法提供了丰富的替代描述符,因为它们被设计为在存在噪声数据的情况下工作,而无需精确匹配。在这里,我们介绍一种方法,该方法适用于图像处理,可以从核苷酸序列数据中提取局部“纹理”变化,从而实现多分辨率局部二进制模式(MLBP)。我们将此功能空间应用于宏基因组数据的无对齐分箱。 MLBP的有效性已通过模拟和真实的人类肠道微生物群落来证明。序列读段或重叠群可以表示为向量,并使用机器学习算法进行有效的比较以执行降维以捕获特征基因组信息并进行聚类(此处使用随机的奇异值分解和BH-tSNE)。我们方法背后的直觉是MLBP特征向量允许序列比较,而无需显式的成对匹配。我们证明了这种方法优于基于k-mer频率的现有方法。因此,信号处理方法MLBP为序列数据的文本表示提供了可行的替代特征空间。可以在https://github.com/skouchaki/MrGBP上找到我们的多分辨率基因组二元模式方法的源代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号