首页> 外文会议>International MultiConference of Engineers and Computer Scientists >FART Neural Network based Probabilistic Motif Discovery in Unaligned Biological Sequences
【24h】

FART Neural Network based Probabilistic Motif Discovery in Unaligned Biological Sequences

机译:基于FART神经网络基于神经网络的概率序列发现在未对准的生物学序列中

获取原文

摘要

Finding Motif in bio-sequences is the most primitive operation in computational Biology. There are many computational requirements for a motif discovery algorithm such as computer memory space requirement and computational complexity. To overcome the complexity of motif discovery, an alternative solution is proposed by integrating genetic algorithm and Fuzzy Art machine learning approaches for eliminating multiple sequence alignment process. Problem statement: More than hundred methods have been proposed for motif discovery in recent years, representing very large variation with respect to both algorithmic approaches as well as the underlying models of regulatory regions. The aim of this study is to develop an alternative solution for motif discovery, which benefits from both data mining and genetic algorithm, and which at the same time eliminates the cost caused by use of multiple sequence alignment. Approach: Genetic algorithm based probabilistic Motif discovery model is designed to solve the problem. The proposed algorithm is implemented using Matlab and also tested with large DNA sequence data sets and synthetic data sets. Results: Results obtained by the proposed model to find the motif in terms of speed and length are compared with the existing method. This proposed method finds length of 11 in 18 sec and length of 15 in 24 sec but the existing methods finds length of 11 in 34 sec. When compared with other techniques the proposed method outperforms the popular existing method. Conclusion: In this study, a model is proposed to discover motif in large set of unaligned sequences in considerably minimum time. Length of motif in this study is also long when compared with existing methods. The proposed algorithm is implemented using Matlab and is tested with large DNA sequence data sets and synthetic data sets.
机译:在生物序列中找到主题是计算生物学中最原始的操作。对于电脑存储空间要求和计算复杂度,诸如计算机存储空间要求等主题发现算法存在许多计算要求。为了克服图案发现的复杂性,通过集成遗传算法和模糊艺术机学习方法来消除多个序列对准过程来提出一种替代解决方案。问题陈述:近年来,已经提出了一百多种方法,以算法对两种算法方法以及监管区域的底层模型表示非常大的变化。本研究的目的是开发用于基序发现的替代解决方案,这是来自数据挖掘和遗传算法的益处,同时消除了通过使用多个序列对准引起的成本。方法:基于遗传算法的概率概率图案模型旨在解决问题。所提出的算法使用MATLAB实现,并用大DNA序列数据集和合成数据集进行测试。结果:通过所提出的模型在速度和长度方面找到图案获得的结果与现有方法相比。这一提出的方法在18秒内获得11个,长度为15秒,但现有方法在34秒内找到11个。与其他技术相比,所提出的方法优于流行的现有方法。结论:在本研究中,提出了一种模型,在大量未对准序列中发现主题在最短的时间内。与现有方法相比,本研究中的图案长度也很长。所提出的算法使用MATLAB实现,并用大DNA序列数据集和合成数据集进行测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号