首页> 美国卫生研究院文献>other >A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique
【2h】

A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique

机译:基于多视角特征和少数族群过采样技术的两步特征预测方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Cancerlectins have an inhibitory effect on the growth of cancer cells and are currently being employed as therapeutic agents. The accurate identification of the cancerlectins should provide insight into the molecular mechanisms of cancers. In this study, a new computational method based on the RF (Random Forest) algorithm is proposed for further improving the performance of identifying cancerlectins. Hybrid feature space before feature selection is developed by combining different individual feature spaces, CTD (Composition, Transition, and Distribution), PseAAC (Pseudo Amino Acid Composition), PSSM (Position-Specific Scoring Matrix), and disorder. The SMOTE (Synthetic Minority Oversampling Technique) is applied to solve the imbalanced data problem. To reduce feature redundancy and computation complexity, we propose a two-step feature selection process to select informative features. A 5-fold cross-validation technique is used for the evaluation of various prediction strategies. The proposed method achieves a sensitivity of 0.779, a specificity of 0.717, an accuracy of 0.748, and an MCC (Matthew's Correlation Coefficient) of 0.497. The prediction results are also compared with other existing methods on the same dataset using 5-fold cross-validation. The comparison results demonstrate the high effectiveness of our method for predicting cancerlectins.
机译:癌症凝集素对癌细胞的生长具有抑制作用,目前被用作治疗剂。正确鉴定抗癌素应该可以洞悉癌症的分子机制。在这项研究中,基于RF(随机森林)算法的一种新的计算方法被提出,以进一步提高鉴定抗癌素的性能。通过组合不同的单个特征空间,CTD(组成,过渡和分布),PseAAC(伪氨基酸组成),PSSM(特定位置评分矩阵)和无序来开发特征选择之前的混合特征空间。 SMOTE(综合少数族裔过采样技术)用于解决数据不平衡问题。为了减少特征冗余和计算复杂度,我们提出了两步特征选择过程来选择信息特征。 5倍交叉验证技术用于评估各种预测策略。所提出的方法实现了0.779的灵敏度,0.717的特异性,0.748的准确度以及0.497的MCC(马修相关系数)。使用5倍交叉验证还将预测结果与同一数据集上的其他现有方法进行比较。比较结果证明了我们的方法在预测癌性凝集素方面的高效性。

著录项

  • 期刊名称 other
  • 作者单位
  • 年(卷),期 -1(2018),-1
  • 年度 -1
  • 页码 9364182
  • 总页数 10
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

  • 入库时间 2022-08-21 11:08:02

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号