首页> 外文会议>International Conference on Discovery Science(DS 2005); 20051008-11; Singapore(SG) >Measuring Over-Generalization in the Minimal Multiple Generalizations of Biosequences
【24h】

Measuring Over-Generalization in the Minimal Multiple Generalizations of Biosequences

机译:在生物序列的最小多重概括中测量过度概括

获取原文
获取原文并翻译 | 示例

摘要

We consider the problem of finding a set of patterns that best characterizes a set of strings. To this end, Arimura et. al. [3] considered the use of minimal multiple generalizations (mmg) for such characterizations. Given any sample set, the mmgs are, roughly speaking, the most (syntactically) specific set of languages containing the sample within a given class of languages. Takae et. al. [17] found the mmgs of the class of pattern languages which includes so-called sort symbols to be fairly accurate as predictors for signal peptides. We first reproduce their results using updated data. Then, by using a measure for estimating the level of over-generalizations made by the mmgs, we show results that explain the high level of accuracies resulting from the use of sort symbols, and discuss how better results can be obtained. The measure that we suggests here can also be applied to other types of patterns, e.g. the PROSITE patterns.
机译:我们考虑找到最能描述一组字符串的模式的问题。为此,Arimura等。等[3]考虑了使用最小多重概括(mmg)进行此类表征。给定任何样本集,mmgs大致来说是在给定语言类别中包含样本的最(语法上)特定的语言集。高荣等等[17]发现,包括所谓的排序符号在内的模式语言类的mmgs作为信号肽的预测因子是相当准确的。我们首先使用更新的数据重现其结果。然后,通过使用一种方法来估计mmgs产生的过度概括的程度,我们显示出的结果可以解释由于使用排序符号而导致的高精度水平,并讨论如何获得更好的结果。我们在这里建议的度量也可以应用于其他类型的模式,例如PROSITE模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号