首页> 外文期刊>Bioinformatics >MosaicFinder: identification of fused gene families in sequence similarity networks
【24h】

MosaicFinder: identification of fused gene families in sequence similarity networks

机译:MosaicFinder:序列相似性网络中融合基因家族的鉴定

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Gene fusion is an important evolutionary process. It can yield valuable information to infer the interactions and functions of proteins. Fused genes have been identified as non-transitive patterns of similarity in triplets of genes. To be computationally tractable, this approach usually imposes an a priori distinction between a dataset in which fused genes are searched for, and a dataset that may have provided genetic material for fusion. This reduces the 'genetic space' in which fusion can be discovered, as only a subset of triplets of genes is investigated. Moreover, this approach may have a high-false-positive rate, and it does not identify gene families descending from a common fusion event. Results: We represent similarities between sequences as a network. This leads to an efficient formulation of previous methods of fused gene identification, which we implemented in the Python program FusedTriplets. Furthermore, we propose a new characterization of families of fused genes, as clique minimal separators of the sequence similarity network. This well-studied graph topology provides a robust and fast method of detection, well suited for automatic analyses of big datasets. We implemented this method in the C++ program MosaicFinder, which additionally uses local alignments to discard false-positive candidates and indicates potential fusion points. The grouping into families will help distinguish sequencing or prediction errors from real biological fusions, and it will yield additional insight into the function and history of fused genes.
机译:动机:基因融合是重要的进化过程。它可以产生有价值的信息来推断蛋白质的相互作用和功能。融合基因已被确定为三胞胎基因相似性的非传递性模式。为了在计算上易于处理,此方法通常在搜索融合基因的数据集和可能提供融合遗传材料的数据集之间施加先验区分。这减少了可以发现融合的“遗传空间”,因为只研究了基因三胞胎的一个子集。而且,该方法可能具有很高的假阳性率,并且它不能鉴定出来自常见融合事件的基因家族。结果:我们将序列之间的相似性表示为网络。这导致了对融合基因鉴定以前方法的有效表述,我们在Python程序FusedTriplets中实现了该方法。此外,我们提出了融合基因家族的新特征,作为序列相似性网络的集团最小分隔符。这种经过深入研究的图拓扑提供了一种鲁棒且快速的检测方法,非常适合大型数据集的自动分析。我们在C ++程序MosaicFinder中实现了此方法,该程序另外使用局部比对来丢弃假阳性候选者并指示潜在的融合点。归类为家族将有助于区分测序或预测错误与实际的生物融合,并将对融合基因的功能和历史产生更多的见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号