首页> 外文学位 >Gene selection for sample sets with biased distributions.
【24h】

Gene selection for sample sets with biased distributions.

机译:具有偏分布的样本集的基因选择。

获取原文
获取原文并翻译 | 示例

摘要

Microarray expression data which contains the expression levels of a large number of simultaneously observed genes have been used in many scientific research and clinical studies. Due to its high dimensionalities, selecting a small number of genes has shown to be beneficial for many tasks such as building prediction models from the microarray expression data or gene regulatory network discovery. Traditional gene selection methods, however, fail to take the class distribution into the selection process. In Biomedical science, it is very common to have microarray expression data which is severely biased with one class of examples (e.g., diseased samples) significantly less than other classes (e.g., normal samples). These sample sets with biased distributions require special attention from researchers for identification of genes responsible for a particular disease. In this thesis, we propose three filtering techniques, Higher Weight ReliefF, ReliefF with Differential Minority Repeat and ReliefF with Balanced Minority Repeat to identify genes responsible for fatal diseases from biased microarray expression data. Our solutions are evaluated on five well-known microarray datasets, Colon, Central Nervous System, DLBCL Tumor, Lymphoma and ECML Pancreas. Experimental comparisons with the traditional ReliefF filtering method demonstrate the effectiveness of the proposed methods in selecting informative genes from microarray expression data with biased sample distributions.
机译:包含大量同时观察到的基因表达水平的微阵列表达数据已被用于许多科学研究和临床研究中。由于其高维性,选择少量基因已被证明对许多任务都是有益的,例如从微阵列表达数据或基因调控网络发现中建立预测模型。然而,传统的基因选择方法不能将类别分布纳入选择过程。在生物医学科学中,微阵列表达数据被一类实例(例如患病样品)严重偏重于其他类(例如正常样品)的情况严重偏见是非常普遍的。这些具有偏向分布的样本集需要研究人员特别注意以鉴定导致特定疾病的基因。在本文中,我们提出了三种过滤技术,即高权重救济,具有差异少数重复的ReliefF和具有平衡少数重复的ReliefF,以从有偏差的微阵列表达数据中鉴定导致致命疾病的基因。我们对五个著名的微阵列数据集(结肠,中枢神经系统,DLBCL肿瘤,淋巴瘤和ECML胰腺)进行了评估。与传统ReliefF过滤方法的实验比较表明,该方法可有效地从样本分布有偏差的微阵列表达数据中选择信息基因。

著录项

  • 作者

    Kamal, Abu Hena Mustafa.;

  • 作者单位

    Florida Atlantic University.;

  • 授予单位 Florida Atlantic University.;
  • 学科 Computer Science.
  • 学位 M.S.
  • 年度 2009
  • 页码 98 p.
  • 总页数 98
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号