首页> 外文期刊>IEEE transactions on nanobioscience >Identifying Individual-Cancer-Related Genes by Rebalancing the Training Samples
【24h】

Identifying Individual-Cancer-Related Genes by Rebalancing the Training Samples

机译:通过重新平衡训练样本来识别与癌症相关的基因

获取原文
获取原文并翻译 | 示例
       

摘要

The identification of individual-cancer-related genes typically is an imbalanced classification issue. The number of known cancer-related genes is far less than the number of all unknown genes, which makes it very hard to detect novel predictions from such imbalanced training samples. A regular machine learning method can either only detect genes related to all cancers or add clinical knowledge to circumvent this issue. In this study, we introduce a training sample rebalancing strategy to overcome this issue by using a two-step logistic regression and a random resampling method. The two-step logistic regression is to select a set of genes that related to all cancers. While the random resampling method is performed to further classify those genes associated with individual cancers. The issue of imbalanced classification is circumvented by randomly adding positive instances related to other cancers at first, and then excluding those unrelated predictions according to the overall performance at the following step. Numerical experiments show that the proposed resampling method is able to identify cancer-related genes even when the number of known genes related to it is small. The final predictions for all individual cancers achieve AUC values around 0.93 by using the leave-one-out cross validation method, which is very promising, compared with existing methods.
机译:个体癌症相关基因的鉴定通常是不平衡的分类问题。已知的癌症相关基因的数量远远少于所有未知基因的数量,这使得很难从这种不平衡的训练样本中检测出新颖的预测。常规的机器学习方法只能检测与所有癌症有关的基因,也可以添加临床知识来规避此问题。在这项研究中,我们介绍了一种训练样本重新平衡策略,以通过使用两步逻辑回归和随机重采样方法来克服此问题。两步逻辑回归是选择一组与所有癌症相关的基因。在执行随机重采样方法以进一步分类与个别癌症相关的那些基因时。通过首先随机添加与其他癌症相关的阳性实例,然后根据后续步骤的总体性能排除那些无关的预测,可以避免分类失衡的问题。数值实验表明,提出的重采样方法即使在已知的与癌症相关的基因数量很少的情况下也能够识别与癌症相关的基因。通过使用留一法交叉验证方法,对于所有单个癌症的最终预测均达到0.93左右的AUC值,与现有方法相比,这是非常有希望的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号