首页> 外文会议>Asia-Pacific Bioinformatics Conference >MODELING 5' REGIONS OF HISTONE GENES USING BAYESIAN NETWORKS
【24h】

MODELING 5' REGIONS OF HISTONE GENES USING BAYESIAN NETWORKS

机译:使用贝叶斯网络建模5'组织基因的区域

获取原文

摘要

Histones constitute a rich protein family that is evolutionarily conserved across species. They play important roles in chromosomal functions in cell, such as chromosome condensation, recombination, replication, and transcription. We have modeled histone gene 5' end segments covering [-50,+500] relative to transcription start sites (TSSs). These segments contain parts of the coding regions in most of the genes that we studied. We determined characteristics of these segments for 116 mammalian (human,mouse, rat) histone genes based on distribution of DNA motifs obtained from MEME-MAST. We found that all five mammalian histone types (HI, H2A, H2B, H3, H4) have mutually distinct, prominent and strongly conserved properties downstream to the TSS reasonably well conserved across analyzed species. We then transformed the primary level motif data for each sequence into a higher order motif arrangement that involved only features such as presence of a motif, its position, its strand orientation, and mutualspacer length between motifs. We have built a Bayesian Network model based on these features and used the higher order motif arrangement data for its training and testing. When tested for classification between the five histone groups and using the leave-one-out cross-validation technique, the Bayesian model correctly classified 100% of histone HI sequences, 100% of histone H2A sequences, 96.9% of histone H2B sequences, 94.4% of histone H3 sequences, and 95.8% of histone H4 sequences. Overall, the model correctly classified 97.4% of all histones sequences. Our Bayesian model has the advantage in having a small number of trainable parameters and it produces very few false positives. The model could be used to scan the genome for discovery of genes whose products are similar to histones.
机译:组蛋构成富含蛋白质的蛋白质,在物种中进化地保守。它们在细胞中的染色体功能中起重要作用,例如染色体缩合,重组,复制和转录。我们具有相对于转录起始位点(TSSS)的覆盖[-50,+ 500]的模型组蛋白基因5'端段。这些段在我们研究的大多数基因中含有部分编码区。基于从MEME-MAST获得的DNA基序的分布,我们确定了116例哺乳动物(人,小鼠,大鼠)组蛋白基因的这些细分的特征。我们发现所有五种哺乳动物组蛋白类型(HI,H 2 A,H2B,H3,H4)在合理地储蓄的物种上相当良好地保存的TSS的相互不同,突出和强烈地保守的性质。然后,我们将每个序列转换为更高阶的图案布置,该序列仅涉及仅诸如存在图案,其位置,其位置,其股线取向和图案之间的相互飞行长度的特征。我们基于这些功能建立了贝叶斯网络模型,并使用了更高阶的图案布置数据进行培训和测试。当测试五种组蛋白组之间的分类并使用休假交叉验证技术时,贝叶斯模型正确分类100%的组蛋白HI序列,100%组蛋白H2A序列,96.9%的组蛋白H2B序列,94.4%组蛋白H3序列,95.8%的组蛋白H4序列。总体而言,该模型正确分类了所有组蛋白序列的97.4%。我们的贝叶斯模型具有少量培训参数的优势,它产生了很少的误报。该模型可用于扫描基因组以发现其产品与组蛋白类似的基因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号