首页> 外文学位 >Mixed Integer Linear Programming Based Implementations of Logical Analysis of Data and Its Applications.
【24h】

Mixed Integer Linear Programming Based Implementations of Logical Analysis of Data and Its Applications.

机译:基于混合整数线性规划的数据逻辑分析实现及其应用。

获取原文
获取原文并翻译 | 示例

摘要

The objective of this dissertation is to develop systematic procedures, which take advantage of advanced combinatorial optimization techniques and computer-related developments, to build on a previously successful two-class classification method, called Logical Analysis of Data (LAD), for optimizing feature selection and identifying the set of combinatorial patterns in large-scale data analysis.;First, we propose an embedded pattern-based feature selection technique. Our feature selection algorithm aims at identifying a small subset of highly influential features from a large-scale dataset to build reliable LAD classification models. The proposed method searches among different feature subsets and interacts with the LAD classification algorithm and its ability to discriminate among the classes. To accomplish this we develop a new software tool, called LFW, which can be used to determine the highest ranking features in the dataset.;Next, we propose a new approach based on integer programming and network flows to select significant patterns to generate accurate LAD models. Our algorithm allows the user-specified significance requirements on patterns such as statistical significance, Hamming distances to ideal patterns, and other pattern characteristics including homogeneity and prevalence. We evaluate, through several experiments on artificial and benchmark datasets, the accuracy of LAD classification models built using our proposed approach, as compared to the accuracy of greedy-heuristic based LAD models.;Traditionally the LAD algorithm is designed to solve two-class classification problems. We present a mixed integer linear program to extend the LAD algorithm to multi-class classification. Our multi-class LAD algorithm efficiently generates reliable multi-class LAD models and takes advantage of parallel programming. The utility of the proposed approach is demonstrated through several experiments on multi-class benchmark datasets.;Finally, we apply the techniques developed in this dissertation to a real-world medical dataset collected as part of the African-American Study of Kidney Disease and Hypertension (AASK). We present various classification models to predict the progression rate of chronic kidney disease and to identify the set of serum proteomic features highly related to the disease outcome.
机译:本文的目的是开发利用先进的组合优化技术和计算机相关开发的系统程序,以先前成功的两类分类方法-逻辑数据分析(LAD)为基础,以优化特征选择首先,提出了一种基于嵌入式模式的特征选择技术。我们的特征选择算法旨在从大型数据集中识别出一小部分具有高度影响力的特征,以建立可靠的LAD分类模型。所提出的方法在不同的特征子集之间进行搜索,并与LAD分类算法及其区分类别的能力进行交互。为此,我们开发了一种称为LFW的新软件工具,该工具可用于确定数据集中的最高排名特征。接下来,我们提出了一种基于整数编程和网络流的新方法,以选择有效模式来生成准确的LAD楷模。我们的算法允许用户指定模式上的显着性要求,例如统计显着性,到理想模式的汉明距离以及其他模式特征(包括同质性和普遍性)。与基于贪婪启发式的LAD模型的准确性相比,我们通过在人工和基准数据集上进行的多次实验,评估了使用我们提出的方法建立的LAD分类模型的准确性。传统上,LAD算法旨在解决两类分类问题。我们提出了一个混合整数线性程序,将LAD算法扩展到多类分类。我们的多类LAD算法可有效地生成可靠的多类LAD模型,并利用并行编程的优势。通过对多类基准数据集进行的多次实验证明了该方法的实用性。最后,我们将本论文中开发的技术应用于作为非裔美国人肾脏疾病和高血压研究的一部分而收集的真实医学数据集(AASK)。我们提出了各种分类模型,以预测慢性肾脏病的进展速度,并确定与疾病结局高度相关的血清蛋白质组学特征。

著录项

  • 作者

    Avila Herrera, Juan Felix.;

  • 作者单位

    Florida Institute of Technology.;

  • 授予单位 Florida Institute of Technology.;
  • 学科 Operations Research.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 168 p.
  • 总页数 168
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 农学(农艺学);
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号