首页> 外文期刊>Information Sciences: An International Journal >Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation
【24h】

Utilization of virtual samples to facilitate cancer identification for DNA microarray data in the early stages of an investigation

机译:在调查的早期阶段利用虚拟样本促进对DNA微阵列数据的癌症鉴定

获取原文
获取原文并翻译 | 示例
       

摘要

DNA microarray datasets are generally small in size, high dimensional with many non-discriminative genes, and non-linear with outliers. Their size/dimension ratio suggests that DNA microarray datasets are identified as small-sample problems. Recently, researchers have developed various gene selection algorithms to discover genes that are most relevant to a specific disease, and thus to reduce computation. Most gene selection algorithms improve learning performance and efficiency, but still suffer from the limitation of insufficient training samples in the datasets. Moreover, in the early stage of diagnosing a new disease, very limited data can be obtained. Therefore, the derived diagnostic model is usually unreliable to identify the new disease. Consequently, the diagnostic performance cannot always be robust, even with the gene selection algorithms. To solve the problem of very limited training dataset with non-linear data or outliers, we propose the method GVSG (Group Virtual Sample Generation), which is a non-linear Virtual Sample Generation algorithm. This non-linear method detects the characteristics in the very limited data, forms discrete groups of each discriminative gene, and systematically generates virtual samples for each of these to accelerate and stabilize the modeling process. The results show that this method significantly improves the learning accuracy in the early stage of DNA microarray data.
机译:DNA微阵列数据集通常尺寸小,具有许多非歧视性基因的高维,与异常值呈非线性关系。它们的大小/尺寸比表明,DNA微阵列数据集被识别为小样本问题。最近,研究人员开发了各种基因选择算法,以发现与特定疾病最相关的基因,从而减少了计算量。大多数基因选择算法可提高学习性能和效率,但仍受数据集中训练样本不足的限制。而且,在诊断新疾病的早期,可以获得非常有限的数据。因此,派生的诊断模型通常不可靠地识别新疾病。因此,即使使用基因选择算法,诊断性能也无法始终保持稳定。为了解决带有非线性数据或离群值的训练数据集非常有限的问题,我们提出了一种方法GVSG(组虚拟样本生成),这是一种非线性虚拟样本生成算法。这种非线性方法可检测非常有限的数据中的特征,形成每个区分基因的离散组,并针对每个虚拟基因系统地生成虚拟样本,以加速和稳定建模过程。结果表明,该方法显着提高了DNA微阵列数据早期的学习准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号