【24h】

Discovery of Non-induced Patterns from Sequences

机译:从序列发现非诱导模式

获取原文

摘要

Discovering patterns from sequence data has significant impact in genomics, proteomics and business. A problem commonly encountered is that the patterns discovered often contain many redundancies resulted from fake significant patterns induced by their strong statistically significant subpatterns. The concept of statistically induced patterns is proposed to capture these redundancies. An algorithm is then developed to efficiently discover non-induced significant patterns from a large sequence dataset. For performance evaluation, two experiments were conducted to demonstrate a) the seriousness of the problem using synthetic data and b) top non-induced significant patterns discovered from Saccharomyces cerevisiae (Yeast) do correspond to the transcription factor binding sites found by the biologists. The experiments confirm the effectiveness of our method in generating a relatively small set of patterns revealing interesting, unknown information inherent in the sequences.
机译:发现序列数据的模式对基因组学,蛋白质组学和业务产生重大影响。通常遇到的问题是发现的模式通常包含许多冗余,这些冗余由其强大的统计学意义的偶像天特素诱导的虚假重要模式引起。提出了统计上诱导的模式的概念来捕获这些冗余。然后开发了一种算法以有效地从大序列数据集中发现非引起的显着模式。对于性能评估,进行了两个实验以证明a)使用合成数据的问题的严重性,b)从酿酒酵母(酵母)发现的冠状非诱导的显着模式对应于生物学家发现的转录因子结合位点。该实验证实了我们在产生相对较小的模式中产生相对较小的模式的方法的有效性,揭示序列中固有的有趣,未知的信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号