首页> 外文期刊>Scientific reports. >A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF
【24h】

A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF

机译:一种基于TF-IDF的新型无比对横向遗传转移检测方法

获取原文
       

摘要

Lateral genetic transfer (LGT) plays an important role in the evolution of microbes. Existing computational methods for detecting genomic regions of putative lateral origin scale poorly to large data. Here, we propose a novel method based on TF-IDF (Term Frequency-Inverse Document Frequency) statistics to detect not only regions of lateral origin, but also their origin and direction of transfer, in sets of hierarchically structured nucleotide or protein sequences. This approach is based on the frequency distributions of k-mers in the sequences. If a set of contiguous k-mers appears sufficiently more frequently in another phyletic group than in its own, we infer that they have been transferred from the first group to the second. We performed rigorous tests of TF-IDF using simulated and empirical datasets. With the simulated data, we tested our method under different parameter settings for sequence length, substitution rate between and within groups and post-LGT, deletion rate, length of transferred region and k size, and found that we can detect LGT events with high precision and recall. Our method performs better than an established method, ALFY, which has high recall but low precision. Our method is efficient, with runtime increasing approximately linearly with sequence length.
机译:横向遗传转移(LGT)在微生物的进化中起重要作用。现有的用于检测假定的侧向起源的基因组区域的计算方法很难处理大数据。在这里,我们提出了一种基于TF-IDF(词频-逆文档频率)统计的新颖方法,不仅可以检测横向起源的区域,还可以检测层次结构的核苷酸或蛋白质序列集中的起源和转移方向。该方法基于序列中k聚体的频率分布。如果一组连续的k-聚体在另一个系统类别中的出现频率远高于在其自身中出现的频率,则我们推断它们已从第一组转移到第二组。我们使用模拟和经验数据集对TF-IDF进行了严格的测试。借助模拟数据,我们在不同参数设置下测试了我们的方法的序列长度,组间和组内以及LGT后的取代率,删除率,转移区域的长度和k大小,发现我们可以高精度检测LGT事件和回忆。我们的方法比建立的方法ALFY表现更好,后者具有较高的查全率但精度较低。我们的方法高效,运行时间随序列长度近似线性增长。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号