首页> 美国卫生研究院文献>other >UNSUPERVISED FEATURE CONSTRUCTION AND KNOWLEDGE EXTRACTION FROM GENOME-WIDE ASSAYS OF BREAST CANCER WITH DENOISING AUTOENCODERS
【2h】

UNSUPERVISED FEATURE CONSTRUCTION AND KNOWLEDGE EXTRACTION FROM GENOME-WIDE ASSAYS OF BREAST CANCER WITH DENOISING AUTOENCODERS

机译:使用去噪自动编码器从乳腺癌基因组全方面评估的未经监督的特征构建和知识提取

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Big data bring new opportunities for methods that efficiently summarize and automatically extract knowledge from such compendia. While both supervised learning algorithms and unsupervised clustering algorithms have been successfully applied to biological data, they are either dependent on known biology or limited to discerning the most significant signals in the data. Here we present denoising autoencoders (DAs), which employ a data-defined learning objective independent of known biology, as a method to identify and extract complex patterns from genomic data. We evaluate the performance of DAs by applying them to a large collection of breast cancer gene expression data. Results show that DAs successfully construct features that contain both clinical and molecular information. There are features that represent tumor or normal samples, estrogen receptor (ER) status, and molecular subtypes. Features constructed by the autoencoder generalize to an independent dataset collected using a distinct experimental platform. By integrating data from ENCODE for feature interpretation, we discover a feature representing ER status through association with key transcription factors in breast cancer. We also identify a feature highly predictive of patient survival and it is enriched by FOXM1 signaling pathway. The features constructed by DAs are often bimodally distributed with one peak near zero and another near one, which facilitates discretization. In summary, we demonstrate that DAs effectively extract key biological principles from gene expression data and summarize them into constructed features with convenient properties.
机译:大数据为有效总结和自动从此类概要中提取知识的方法带来了新机遇。尽管有监督学习算法和无监督聚类算法都已成功应用于生物数据,但是它们要么依赖于已知生物学,要么仅限于辨别数据中最重要的信号。在这里,我们介绍降噪自动编码器(DAs),它采用独立于已知生物学的数据定义学习目标,作为从基因组数据中识别和提取复杂模式的方法。我们通过将DAs应用于大量乳腺癌基因表达数据来评估其性能。结果表明,DAs成功构建了包含临床和分子信息的特征。有代表肿瘤或正常样品,雌激素受体(ER)状态和分子亚型的特征。自动编码器构造的功能可以概括为使用不同实验平台收集的独立数据集。通过整合来自ENCODE的数据进行特征解释,我们发现了与乳腺癌中的关键转录因子相关联的代表ER状态的特征。我们还确定了高度预测患者生存的功能,并通过FOXM1信号通路丰富了它。由DA构造的特征通常是双峰分布的,一个峰接近零,另一个峰接近一,这有助于离散化。总而言之,我们证明了DA有效地从基因表达数据中提取了关键的生物学原理,并将其概括为具有方便特性的构建特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号