A new DNA sequence entropy-based Kullback-Leibler algorithm for gene clustering

首页> 外文期刊>Journal of Applied Genetics >A new DNA sequence entropy-based Kullback-Leibler algorithm for gene clustering

【24h】

A new DNA sequence entropy-based Kullback-Leibler algorithm for gene clustering

机译：基于新的基于DNA序列熵的基于基因聚类的kullback-Leibler算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Information theory is a branch of mathematics that overlaps with communications, biology, and medical engineering. Entropy is a measure of uncertainty in the set of information. In this study, for each gene and its exons sets, the entropy was calculated in orders one to four. Based on the relative entropy of genes and exons, Kullback-Leibler divergence was calculated. After obtaining the Kullback-Leibler distance for genes and exons sets, the results were entered as input into 7 clustering algorithms: single, complete, average, weighted, centroid, median, and K-means. To aggregate the results of clustering, the AdaBoost algorithm was used. Finally, the results of the AdaBoost algorithm were investigated by GeneMANIA prediction server to explore the results from gene annotation point of view. All calculations were performed using the MATLAB Engineering Software (2015). Following our findings on investigating the results of genes metabolic pathways based on the gene annotations, it was revealed that our proposed clustering method yielded correct, logical, and fast results. This method at the same that had not had the disadvantages of aligning allowed the genes with actual length and content to be considered and also did not require high memory for large-length sequences. We believe that the performance of the proposed method could be used with other competitive gene clustering methods to group biologically relevant set of genes. Also, the proposed method can be seen as a predictive method for those genes bearing up weak genomic annotations.

机译：信息理论是与通信，生物学和医疗工程重叠的数学分支。熵是该集合中的不确定性的衡量标准。在这项研究中，对于每个基因及其外显子组，熵计算一到四个。基于基因和外显子的相对熵，计算Kullback-Leibler分歧。在获得基因和外显子组的Kullback-Leibler距离后，将结果作为输入输入7个聚类算法：单，完整，平均，加权，质心，中位数和K均值。要聚合聚类结果，使用了adaboost算法。最后，通过Genemania预测服务器研究了Adaboost算法的结果，探讨了基因注释的观点结果。所有计算都使用Matlab工程软件（2015）进行。在我们对基于基因注释来研究基因代谢途径的结果的研究结果之后，我们揭示了我们所提出的聚类方法产生正确，逻辑和快速的结果。该方法在不具有对准的缺点的情况下，允许考虑具有实际长度和含量的基因，并且也不需要高度序列的高存储器。我们认为，所提出的方法的性能可以与其他竞争性基因聚类方法一起使用，用于对生物相关基因组进行组。此外，所提出的方法可以被视为具有弱基因组注释的那些基因的预测方法。

著录项

来源
《Journal of Applied Genetics》 |2020年第2期|共8页
作者

展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类普通生物学;
关键词
Information theory; Dairy cattle; Kullback-Leibler divergence; Gene clustering;

机译：信息理论;乳制品;Kullback-Leibler发散;基因聚类;

相似文献

外文文献
中文文献
专利

1. A new DNA sequence entropy-based Kullback-Leibler algorithm for gene clustering [J] . Journal of Applied Genetics . 2020,第2期

机译：基于新的基于DNA序列熵的基于基因聚类的kullback-Leibler算法
2. Genetic algorithm-tuned entropy-based fuzzy C-means algorithm for obtaining distinct and compact clusters [J] . Vidyut Dey, Dilip Kumar Pratihar, G. L. Datta Fuzzy Optimization and Decision Making: A Journal of Modeling and Computation Under Uncertainty . 2011,第2期

机译：基于遗传算法的基于熵的模糊C均值算法，用于获得清晰紧凑的簇
3. DNA Sequencing and Identification of Serogroup-Specific Genes in the Escherichia coli O118 O Antigen Gene Cluster and Demonstration of Antigenic Diversity But Only Minor Variation in DNA Sequence of the O Antigen Clusters of E. coli O118 and O151 [J] . Yanhong Liu Pina Fratamico Chitrita Debroy Alyssa C. Bumbaugh John W. Allen. Foodborne Pathogens and Disease . 2008,第4期

机译：大肠杆菌O118 O抗原基因簇中血清群特异性基因的DNA测序和鉴定以及抗原多样性的证明，但O118和O151大肠杆菌O抗原簇的DNA序列仅有微小变化
4. Entropy-based Sequence Clustering Algorithm for Analyzing Software Fault Feature [C] . Yanyan Wang, Jiadong Ren, Jiaxin Liu, ISAI 2010;International conference on information security and artificial intelligence . 2010

机译：基于熵的序列聚类算法分析软件故障特征
5. Detailed characterization of the human protamine gene cluster region: Physical analysis, isolation of the transition protein 2 gene and complete DNA sequence analysis [D] . Nelson, James Edward 1994

机译：人鱼精蛋白基因簇区域的详细表征：物理分析，过渡蛋白2基因的分离和完整的DNA序列分析
6. Hesitant Fuzzy Entropy-Based Opportunistic Clustering and Data Fusion Algorithm for Heterogeneous Wireless Sensor Networks [O] . Junaid Anees, Hao-Chun Zhang, Sobia Baig, 2020

机译：异构无线传感器网络基于犹豫模糊熵的机会聚类和数据融合算法
7. Figure 4: (A) One conserved sequence, which occurs 79 times in 46,264 binding site peaks from the ChIP-seq data-set. The mutation profile of this conserved sequence is illustrated, where ’_ ’ indicates this base is unchanged; DEL indicates this base is lost; INS X indicates a new base X is inserted in front of this base. (B) Several repeated elements patterns are listed. (C) In the first column, the top five DNA motifs, mined by meme-chip tools (Machanick Bailey, 2011) are illustrated. The resemblant conserved sequences, found by the CFSP algorithm are listed in the second column. In the third column, the position-specific scoring matrices, which are transformed from mutational information are listed. The similarity between meme motif and resemblant conserved sequence with PSSM format was calculated via a stamp motif comparison tool (Mahony Benos, 2007). The E-values for the similarity of those pairs is displayed in the fourth column. (D) One motif is selected in each group clustered by gkmsvm descriptors, and the corresponding motif found by the CFSP algorithm is listed below. (E) There are additional datasets (File No: ENCFF100GRL, ENCFF616IRT, ENCFF870CER, Target: SREBF1) collected from https://www.encodeproject.org. The top two motifs are selected in each file using meme tools, and the corresponding motifs found by our algorithm are listed below. [O] . -1

机译：图4：（a）一种保守序列，其发生在芯片-SEQ数据集中的46,264个结合位点峰值中的79倍。说明了这种保守序列的突变分布，其中'_'表示该碱度不变; del表示此基础丢失; INS X表示新的基础X插入此基础前面。（b）列出了几种重复的元素模式。（c）在第一栏中，示出了由MEME芯片工具（Machanick＆Bailey，2011）开采的前五个DNA主题。由CFSP算法发现的相应保守序列列于第二列中。在第三列中，列出了从突变信息转换的特定位置的评分矩阵。 MEME主题与PSSM格式的相似性与PSSM格式之间的相似性通过邮票图章比较工具（Mahony＆Benos，2007）计算。这些对相似性的电子值显示在第四列中。（d）在由GKMSVM描述符聚集的每个组中选择了一个图案，下面列出了CFSP算法的相应主题。（e）从https://www.encodeproject.org收集的，有附加数据集（文件no：cernff100grl，cenf616irl，conf8.20cer，target：srebf1）。使用MEME工具在每个文件中选择前两个图案，并且我们的算法发现的相应主题如下所示。
8. DNA Sequences of Genes Encoding Acinetobacter calcoaceticus Protocatechuate 3,4-Dioxygenase: Evidence Indicating Shuffling of Genes and of DNA Sequences within Genes during Their Evolutionary Divergence [R] . Hartnett, C., Neidle, E. L., Ngai, K. L., 1990

机译：编码醋酸钙不动杆菌原儿茶酸3,4-二氧化酶基因的DNa序列：证据表明基因在进化发散过程中基因和DNa序列的改组

A new DNA sequence entropy-based Kullback-Leibler algorithm for gene clustering

摘要

著录项

相似文献

相关主题

期刊订阅