首页> 外文会议>Workshop on Genome Informatics >A string pattern regression algorithm and its application to pattern discovery in long introns.
【24h】

A string pattern regression algorithm and its application to pattern discovery in long introns.

机译:串案回归算法及其在长内含子中模式发现的应用。

获取原文

摘要

We present a new approach to pattern discovery called string pattern regression, where we are given a data set that consists of a string attribute and an objective numerical attribute. The problem is to find the best string pattern that divides the data set in such a way that the distribution of the numerical attribute values of the set for which the pattern matches the string attribute, is most distinct, with respect to some appropriate measure, from the distribution of the numerical attribute values of the set for which the pattern does not match the string attribute. By solving this problem, we are able to discover, at the same time, a subset of the data whose objective numerical attributes are significantly different from rest of the data, as well as the splitting rule in the form of a string pattern that is conserved in the subset. Although the problem can be solved in linear time for the substring pattern class, the problem is NP-hard in the general case (i.e. more complex patterns), and we present an exact but efficient branch-and-bound algorithm which is applicable to various pattern classes. We apply our algorithm to intron sequences of human, mouse, fly, and zebrafish, and show the practicality of our approach and algorithm. We also discuss possible extensions of our algorithm, as well as promising applications, such as microarray gene expression data.
机译:我们提出了一种新的模式发现,称为字符串模式回归,其中我们被提供了一个由String属性和客观数字属性组成的数据集。问题是找到最佳的字符串模式,其划分数据集中的方式,使得模式与字符串属性的集合的数值属性的分布是大多数相对于一些适当的措施,来自图案与字符串属性不匹配的集合的数值属性值的分布。通过解决这个问题,我们能够同时发现其客观数字属性与数据的其余部分显着不同的数据的子集,以及保守的字符串模式的形式的分割规则在子集中。虽然问题可以在线性时间求解子字符串模式类中,但是在常规情况下,问题是NP - 硬于(即更复杂的模式),并且我们呈现了一种精确但有效的分支和绑定算法,其适用于各种模式类。我们将算法应用于人类,鼠标,飞行和斑马鱼的内含子序列,并显示了我们方法和算法的实用性。我们还讨论了算法的可能扩展,以及有前途的应用,例如微阵列基因表达数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号