首页> 外文期刊>Bioinformatics >Discriminative motif optimization based on perceptron training.
【24h】

Discriminative motif optimization based on perceptron training.

机译:基于感知器训练的区分性主题优化。

获取原文
获取原文并翻译 | 示例
           

摘要

MOTIVATION: Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount of data. To overcome this limitation, tools have been developed that compromise accuracy with speed by using heuristic discrete search strategies or limited optimization of identified seed motifs. However, such strategies may not fully use the information in input sequences to generate motifs. Such motifs often form good seeds and can be further improved with appropriate scoring functions and rapid optimization. RESULTS: We report a tool named discriminative motif optimizer (DiMO). DiMO takes a seed motif along with a positive and a negative database and improves the motif based on a discriminative strategy. We use area under receiver-operating characteristic curve (AUC) as a measure of discriminating power of motifs and a strategy based on perceptron training that maximizes AUC rapidly in a discriminative manner. Using DiMO, on a large test set of 87 TFs from human, drosophila and yeast, we show that it is possible to significantly improve motifs identified by nine motif finders. The motifs are generated/optimized using training sets and evaluated on test sets. The AUC is improved for almost 90% of the TFs on test sets and the magnitude of increase is up to 39%. AVAILABILITY AND IMPLEMENTATION: DiMO is available at http://stormo.wustl.edu/DiMORegistry Number/Name of Substance 0 (Transcription Factors). 9007-49-2 (DNA).
机译:动机:从利用下一代测序尤其是ChIP-seq产生的数据中产生准确的转录因子(TF)结合位点基序是一项挑战。之所以出现挑战,是因为典型的实验报告了由TF绑定的大量序列,并且每个序列的长度都比较长。大多数传统的主题查找器在处理如此大量的数据时都很慢。为了克服这个限制,已经开发了通过使用启发式离散搜索策略或对确定的种子基序的有限优化来折中精度与速度的工具。但是,这样的策略可能无法完全使用输入序列中的信息来生成主题。这样的图案通常形成良好的种子,并可以通过适当的评分功能和快速优化来进一步改进。结果:我们报告了一个名为判别基序优化器(DiMO)的工具。 DiMO带有种子主题以及一个正数和一个负数数据库,并基于判别策略改进了该主题。我们使用接收器操作特征曲线(AUC)下的面积作为区分图案的能力的度量,并使用基于感知器训练的策略以区分性快速最大化AUC。使用DiMO,在来自人类,果蝇和酵母的87个TF的大型测试集上,我们表明可以显着改善由9个基序发现者识别出的基序。使用训练集生成/优化主题,并在测试集上进行评估。在测试装置上,几乎90%的TF的AUC都得到了改善,并且增加幅度高达39%。可用性和实现:DiMO可从http://stormo.wustl.edu/DiMO注册号/物质名称0(转录因子)获得。 9007-49-2(DNA)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号