首页> 外文会议>STACS 96 >Extracting Best Consensus Motifs from Positive and Negative Examples
【24h】

Extracting Best Consensus Motifs from Positive and Negative Examples

机译:从正例和负例中提取最佳共识图案

获取原文
获取原文并翻译 | 示例

摘要

We define the best consensus motif (BCM) problem motivated by the problem of extracting motifs from nucleic acid and amino acid sequences. A type over an alphabet sum is a family #OMEGA# of subsets of sum. A motif #pi# of type #OMEGA# is a string #pi# = #pi#_1 ...#pi#_n of motif components, each of which stands for an element in #OMEGA#. The BCM problem for #OMEGA# is, given a yes-no sample S={(#alpha#~((1)),#beta#~((1)),...,(#alpha#~((m)), #beta#~((m)))} of pairs of strings in sum with #alpha#~((i)) not = #beta#~((i)) for 1 <=i <=m, to find a motif #pi# of type #OMEGA# that maximizes the number of good pairs in S, where (#alpha#~((i)), #beta#~((i)) is good for #pi# if #pi# accepts #alpha#~((i)) and rejects #beta#~((i)). We prove that the BCM problem is NP-complete even for a very simple type #OMEGA#_1=2~sum - {#PHI#}, which is used, in practice, for describing protein motifs in the PROSITE database. We also show that the NP-completeness of the problem does not change for the type #OMEGA#_infinity = #OMEGA#_1 U {sum~+} U {sum~([i,j]) | 1 <=i<=j}, where sum~([i,j]) is the set of strings over sum of length between i and j. Furthermore, for the BCM problem for #OMEGA#_1, we provide a polynomal-time greedy algorithm based on the probabilistic method. Its performance analysis shows an explicit approximation ratio of the algorithm.
机译:我们定义了最佳共识主题(BCM)问题,其动机是从核酸和氨基酸序列中提取主题。字母总和上的类型是总和子集的系列#OMEGA#。 #OMEGA#类型的主题#pi#是主题组成部分的字符串#pi#=#pi#_1 ...#pi#_n,每个组成部分代表#OMEGA#中的一个元素。给定是-否样本S = {(#alpha#〜((1)),#beta#〜((1)),...,(#alpha#〜(( m)),成对的字符串#beta#〜(((m)))}与#alpha#〜((i))不等于1 <= i <= m的#beta#〜((i)) ,以找到类型为#OMEGA#的主题#pi#,该主题使S中的好对的数目最大化,其中(#alpha#〜((i)),#beta#〜((i))对#pi#是好的如果#pi#接受#alpha#〜((i))而拒绝#beta#〜((i))我们证明即使对于非常简单的类型#OMEGA#_1 = 2〜sum,BCM问题也是NP完全的-{#PHI#},实际上用于描述PROSITE数据库中的蛋白质基序,我们还表明,问题的NP完整性对于#OMEGA#_infinity =#OMEGA#_1 U类型不会改变{sum〜+} U {sum〜([i,j])| 1 <= i <= j},其中sum〜([i,j])是在i和j之间的长度之和上的字符串集。此外,针对#OMEGA#_1的BCM问题,我们提供了一种基于概率方法的多项式时间贪心算法,其性能分析表明了算法的显式近似率。定理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号