首页> 外文会议>Asia-Pacific Bioinformatics Conference >Prediction of enhancer-promoter interactions via natural language processing
【24h】

Prediction of enhancer-promoter interactions via natural language processing

机译:通过自然语言处理预测增强剂 - 启动子相互作用

获取原文

摘要

Background: Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput.Results: We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841-0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889- 0.940 can be achieved by combining sequence embedding features and experimental features.Conclusions: EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.
机译:背景:三维基因组组织的精确鉴定,特别是增强剂 - 启动子相互作用(EPIS),对解入基因调节,细胞分化和疾病机制是重要的。目前,由于传统实验方法的功率受到低分辨率或低吞吐量的限制,这是一种具有挑战性的任务,因为传统实验方法的力量受到限制的影响。结果:我们提出了一种新颖的计算框架EP2VEC以测定三维基因组互动。我们首先将序列嵌入特征提取,定义为使用自然语言处理中的无监督深度学习方法从可变长度序列中学到的固定长度向量表示。然后,我们训练一个分类器,以预测Epis使用受监督方式的学习表现。实验结果表明,EP2VEC在不同数据集中获得0.841-0.933的F1分数,这优于现有方法。通过进行灵敏度分析,我们证明了序列嵌入特征的稳健性。此外,我们通过采用注意机制分析学习序列嵌入功能来确定代表特定于细胞线特定信息的图案。最后,我们表明,通过组合序列嵌入特征和实验特征,可以实现与F1分数的卓越性能0.889- 0.940。结论:EP2VEC脱落对任意长度的DNA序列的特征提取,提供了一种强大的EPIS识别方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号