首页> 外文期刊>Bioinformatics >MUSA: a parameter free algorithm for the identification of biologically significant motifs
【24h】

MUSA: a parameter free algorithm for the identification of biologically significant motifs

机译:MUSA:一种无参数的算法,用于识别生物学上重要的基序

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: The ability to identify complex motifs, i.e. non-contiguous nucleotide sequences, is a key feature of modern motif finders. Addressing this problem is extremely important, not only because these motifs can accurately model biological phenomena but because its extraction is highly dependent upon the appropriate selection of numerous search parameters. Currently available combinatorial algorithms have proved to be highly efficient in exhaustively enumerating motifs (including complex motifs), which fulfill certain extraction criteria. However, one major problem with these methods is the large number of parameters that need to be specified.Results: We propose a new algorithm, MUSA (Motif finding using an UnSupervised Approach), that can be used either to autonomously find over-represented complex motifs or to estimate search parameters for modern motif finders. This method relies on a biclustering algorithm that operates on a matrix of co-occurrences of small motifs. The performance of this method is independent of the composite structure of the motifs being sought, making few assumptions about their characteristics. The MUSA algorithm was applied to two datasets involving the bacterium Pseudomonas putida KT2440. The first one was composed of 70 sigma(54)-dependent promoter sequences and the second dataset included 54 promoter sequences of up-regulated genes in response to phenol, as suggested by quantitative proteomics. The results obtained indicate that this approach is very effective at identifying complex motifs of biological significance.
机译:动机:识别复杂基序即非连续核苷酸序列的能力是现代基序发现者的关键特征。解决这个问题非常重要,这不仅是因为这些图案可以准确地模拟生物现象,还因为其提取高度依赖于众多搜索参数的适当选择。事实证明,当前可用的组合算法在彻底枚举满足某些提取标准的主题(包括复杂主题)方面非常高效。但是,这些方法的主要问题是需要指定大量参数。结果:我们提出了一种新算法MUSA(使用无监督方法进行主题查找),该算法可用于自主查找过度表达的复杂主题或估算现代主题发现者的搜索参数。此方法依赖于对小图案同时出现的矩阵进行运算的双聚类算法。此方法的性能与所要查找的图案的复合结构无关,因此很少对其特征进行假设。 MUSA算法应用于涉及假单胞菌恶臭假单胞菌KT2440的两个数据集。第一个由70 sigma(54)依赖性启动子序列组成,第二个数据集包括对酚响应的上调基因的54个启动子序列,这是定量蛋白质组学的建议。获得的结果表明,该方法在鉴定具有生物学意义的复杂基序方面非常有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号