首页> 外文期刊>Journal of Applied Genetics >A new DNA sequence entropy-based Kullback-Leibler algorithm for gene clustering
【24h】

A new DNA sequence entropy-based Kullback-Leibler algorithm for gene clustering

机译:基于新的基于DNA序列熵的基于基因聚类的kullback-Leibler算法

获取原文
获取原文并翻译 | 示例
           

摘要

Information theory is a branch of mathematics that overlaps with communications, biology, and medical engineering. Entropy is a measure of uncertainty in the set of information. In this study, for each gene and its exons sets, the entropy was calculated in orders one to four. Based on the relative entropy of genes and exons, Kullback-Leibler divergence was calculated. After obtaining the Kullback-Leibler distance for genes and exons sets, the results were entered as input into 7 clustering algorithms: single, complete, average, weighted, centroid, median, and K-means. To aggregate the results of clustering, the AdaBoost algorithm was used. Finally, the results of the AdaBoost algorithm were investigated by GeneMANIA prediction server to explore the results from gene annotation point of view. All calculations were performed using the MATLAB Engineering Software (2015). Following our findings on investigating the results of genes metabolic pathways based on the gene annotations, it was revealed that our proposed clustering method yielded correct, logical, and fast results. This method at the same that had not had the disadvantages of aligning allowed the genes with actual length and content to be considered and also did not require high memory for large-length sequences. We believe that the performance of the proposed method could be used with other competitive gene clustering methods to group biologically relevant set of genes. Also, the proposed method can be seen as a predictive method for those genes bearing up weak genomic annotations.
机译:信息理论是与通信,生物学和医疗工程重叠的数学分支。熵是该集合中的不确定性的衡量标准。在这项研究中,对于每个基因及其外显子组,熵计算一到四个。基于基因和外显子的相对熵,计算Kullback-Leibler分歧。在获得基因和外显子组的Kullback-Leibler距离后,将结果作为输入输入7个聚类算法:单,完整,平均,加权,质心,中位数和K均值。要聚合聚类结果,使用了adaboost算法。最后,通过Genemania预测服务器研究了Adaboost算法的结果,探讨了基因注释的观点结果。所有计算都使用Matlab工程软件(2015)进行。在我们对基于基因注释来研究基因代谢途径的结果的研究结果之后,我们揭示了我们所提出的聚类方法产生正确,逻辑和快速的结果。该方法在不具有对准的缺点的情况下,允许考虑具有实际长度和含量的基因,并且也不需要高度序列的高存储器。我们认为,所提出的方法的性能可以与其他竞争性基因聚类方法一起使用,用于对生物相关基因组进行组。此外,所提出的方法可以被视为具有弱基因组注释的那些基因的预测方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号