首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Interpretable deep neural networks for enhancer prediction
【24h】

Interpretable deep neural networks for enhancer prediction

机译:可解释的深度神经网络用于增强子预测

获取原文

摘要

Enhancers are short DNA sequences that modulate gene expression patterns. Recent studies have shown that enhancer elements could be enriched for certain histone modification combinatorial codes, leading to interest in developing computational models to predict enhancer locations. Here we present EP-DNN, a protocol for predicting enhancers based on chromatin features, in two different cell types, a human embryonic (H1) and a human lung fibroblast (IMR90) cell line. Specifically, we use a deep neural network (DNN)-based architecture to extract enhancer signatures. We train EP-DNN using distal p300 binding sites, as enhancers, and TSS and random non-DNase-I hypersensitivity sites, as non-enhancers. We find that EP-DNN has superior accuracy relative to other state-of-the-art algorithms, such as DEEP-EN and RFECS, and also scales well to large number of predictions. Then, we surmount the problem that DNN results are not interpretable and develop a method to interpret which histone modifications are important, and within that, which spatial features proximal or distal to the enhancer site, are important. We uncover that the important histone modifications vary between cell types. Further, whether the important features are clustered around the enhancer peak or more spread out also differs among the different histone modifications. Thus, we bring forth a new paradigm for automatically determining the important features and the important histone modifications, rather than the current computational standard of using the same fixed number of features from all the histone modifications for all cell types. Our results have implications for computational scientists who can now do feature selection for their classification task and for biologists who can now experimentally collect data only for the relevant histone modifications.
机译:增强子是调节基因表达模式的短DNA序列。最近的研究表明,对于某些组蛋白修饰组合码,增强子元素可能会丰富,从而引起人们对开发预测增强子位置的计算模型的兴趣。在这里,我们介绍了EP-DNN,这是一种基于染色质特征预测增强子的方案​​,在两种不同的细胞类型中,即人类胚胎(H1)和人类肺成纤维细胞(IMR90)细胞系。具体来说,我们使用基于深度神经网络(DNN)的体系结构来提取增强子签名。我们训练使用远端p300结合位点作为增强剂,使用TSS和随机非DNase-I超敏位点作为非增强剂的EP-DNN。我们发现,相对于其他最新算法(例如DEEP-EN和RFECS),EP-DNN具有更高的准确性,并且可以很好地扩展到大量预测。然后,我们克服了DNN结果无法解释的问题,并开发了一种方法来解释哪些组蛋白修饰很重要,以及在其中哪些区域,增强子部位的近端或远端的空间特征很重要。我们发现重要的组蛋白修饰随细胞类型而变化。此外,在不同的组蛋白修饰中,重要特征是聚集在增强子峰周围还是更大地散开也不同。因此,我们提出了一种新的范式,用于自动确定重要特征和重要组蛋白修饰,而不是针对所有细胞类型从所有组蛋白修饰中使用相同固定数量的特征的当前计算标准。我们的结果对现在可以为他们的分类任务进行特征选择的计算科学家以及现在只能通过实验收集有关组蛋白修饰的数据的生物学家产生影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号