...
首页> 外文期刊>BMC Bioinformatics >A novel ensemble learning method for de novo computational identification of DNA binding sites
【24h】

A novel ensemble learning method for de novo computational identification of DNA binding sites

机译:DNO Novo计算鉴定DNA结合位点的新集合学习方法

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background Despite the diversity of motif representations and search algorithms, the de novo computational identification of transcription factor binding sites remains constrained by the limited accuracy of existing algorithms and the need for user-specified input parameters that describe the motif being sought. Results We present a novel ensemble learning method, SCOPE, that is based on the assumption that transcription factor binding sites belong to one of three broad classes of motifs: non-degenerate, degenerate and gapped motifs. SCOPE employs a unified scoring metric to combine the results from three motif finding algorithms each aimed at the discovery of one of these classes of motifs. We found that SCOPE's performance on 78 experimentally characterized regulons from four species was a substantial and statistically significant improvement over that of its component algorithms. SCOPE outperformed a broad range of existing motif discovery algorithms on the same dataset by a statistically significant margin. Conclusion SCOPE demonstrates that combining multiple, focused motif discovery algorithms can provide a significant gain in performance. By building on components that efficiently search for motifs without user-defined parameters, SCOPE requires as input only a set of upstream sequences and a species designation, making it a practical choice for non-expert users. A user-friendly web interface, Java source code and executables are available at http://genie.dartmouth.edu/scope .
机译:背景技术尽管主题表示和搜索算法的多样性,但是转录因子绑定站点的DE Novo计算识别仍然受到现有算法的有限精度和描述所寻求的主题的用户指定的输入参数的有限精度约束。结果我们提出了一种新的集合学习方法,范围,基于转录因子结合位点属于三种广泛的主题之一的假设:非退化,退化和隐形的主题。范围采用统一的评分指标来将三个图案发现算法的结果组合在一起,每个算法旨在发现这些类型的主题之一。我们发现,来自四种物种的78个实验表征的调节件的范围是对其组成算法的实质性和统计学的显着改善。范围在统计上显着的边距上表现了同一数据集的广泛现有的主题发现算法。结论范围表明,组合多个聚焦的基序发现算法可以提供显着的性能增益。通过构建有效搜索没有用户定义参数的图案的组件,范围需要输入一组上游序列和物种指定,使其成为非专家用户的实际选择。用户友好的Web界面,Java源代码和可执行文件可在http://genie.dartmouth.edu/scope提供。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号