...
首页> 外文期刊>電子情報通信学会技術研究報告. 情報論的学習理論と機械学習 >Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease
【24h】

Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease

机译:基于主成分分析的无监督特征提取在创伤后应激障碍介导的心脏病的计算机医学发现中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

Background: Feature extraction (FE) is difficult, particularly if there are more features than samples, as small sample numbers often result in biased outcomes or overfitting. Furthermore, multiple sample classes often complicate FE because evaluating performance, which is usual in supervised FE, is generally harder than the two-class problem. Developing sample classification independent unsupervised methods would solve many of these problems. Results: Two principal component analysis (PCA)-based FE, specifically, variational Bayes PCA (VBPCA) was extended to perform unsupervised FE, and together with conventional PCA (CPCA)-based unsupervised FE, were tested as sample classification independent unsupervised FE methods. VBPCA- and CPCA-based unsupervised FE both performed well when applied to simulated data, and a posttraumatic stress disorder (PTSD)-mediated heart disease data set that had multiple categorical class observations in mRNA/microRNA expression of stressed mouse heart. A critical set of PTSD miRNAs/mRNAs were identified that show aberrant expression between treatment and control samples, and significant, negative correlation with one another. Moreover, greater stability and biological feasibility than conventional supervised FE was also demonstrated. Based on the results obtained, in silico drug discovery was performed as translational validation of the methods. Conclusions Our two proposed unsupervised FE methods (CPCA- and VBPCA-based) worked well on simulated data, and outperformed two conventional supervised FE methods on a real data set. Thus, these two methods have suggested equivalence for FE on categorical multiclass data sets, with potential translational utility for in silico drug discovery.
机译:背景:特征提取(FE)困难,特别是如果特征多于样本时,因为少量样本通常会导致结果有偏差或过度拟合。此外,多个样本类别通常使有限元分析复杂化,因为在监督有限元分析中通常要评估性能通常比两类问题难。开发独立于样本分类的无监督方法将解决许多这些问题。结果:扩展了两个基于主成分分析(PCA)的有限元,特别是扩展了变分贝叶斯PCA(VBPCA)以执行无监督的有限元分析,并与基于常规PCA(CPCA)的无监督的有限元分析一起,作为样本分类独立的无监督的有限元分析方法进行了测试。基于VBPCA和CPCA的无监督FE应用于模拟数据以及创伤后应激障碍(PTSD)介导的心脏病数据集均表现良好,该数据集在应激小鼠心脏的mRNA / microRNA表达中具有多种分类观察。鉴定出一组关键的PTSD miRNA / mRNA,它们显示出处理样品与对照样品之间的异常表达,并且彼此之间显着负相关。此外,还证明了比常规监督的FE更高的稳定性和生物学可行性。基于获得的结果,进行计算机药物发现作为方法的翻译验证。结论我们提出的两种无监督有限元方法(基于CPCA和VBPCA)在模拟数据上效果很好,并且在真实数据集上优于两种常规的有监督有限元方法。因此,这两种方法建议在分类多类数据集上使用有限元等效,并具有用于计算机医学发现的潜在翻译实用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号