【24h】

Protein Sequence Pattern Mining with Constraints

机译:有约束的蛋白质序列模式挖掘

获取原文
获取原文并翻译 | 示例

摘要

Considering the characteristics of biological -sequence databases, which typically have a small alphabet, a very long length and a relative small size (several hundreds of sequences), we propose a new sequence mining algorithm (gIL). gIL was developed for linear sequence pattern mining and results from the combination of some of the most efficient techniques used in sequence and itemset mining. The algorithm exhibits a high adaptability, yielding a smooth and direct introduction of various types of features into the mining process, namely the extraction of rigid and arbitrary gap patterns. Both breadth or a depth first traversal are possible. The experimental evaluation, in synthetic and real life protein databases, has shown that our algorithm has superior performance to state-of-the art algorithms. The use of constraints has also proved to be a very useful tool to specify user interesting patterns.
机译:考虑到生物序列数据库的特征,通常具有较小的字母,非常长的长度和相对较小的大小(数百个序列),我们提出了一种新的序列挖掘算法(gIL)。 gIL是为线性序列模式挖掘而开发的,它是由序列和项集挖掘中使用的一些最有效技术的组合得出的。该算法具有很高的适应性,可以将各种类型的特征平稳,直接地引入到挖掘过程中,即提取刚性和任意间隙模式。宽度或深度优先遍历都是可能的。在合成和现实生活中的蛋白质数据库中进行的实验评估表明,我们的算法比最先进的算法具有更好的性能。约束的使用也已证明是指定用户感兴趣的模式的非常有用的工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号