...
首页> 外文期刊>Annals of Operations Research >Multi-pattern generation framework for logical analysis of data
【24h】

Multi-pattern generation framework for logical analysis of data

机译:用于数据逻辑分析的多模式生成框架

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Logical analysis of data (LAD) is a rule-based data mining algorithm using combinatorial optimization and boolean logic for binary classification. The goal is to construct a classification model consisting of logical patterns (rules) that capture structured information from observations. Among the four steps of LAD framework (binarization, feature selection, pattern generation, and model construction), pattern generation has been considered the most important step. Combinatorial enumeration approaches to generate all possible patterns were mostly studied in the literature; however, those approaches suffered from the computational complexity of pattern generation that grows exponentially with data (feature) size. To overcome the problem, recent studies proposed column generation-based approaches to improve the efficacy of building a LAD model with a maximum-margin objective. There was still a difficulty in solving subproblems efficiently to generate patterns. In this study, a new column generation framework is proposed, in which a new mixed-integer linear programming approach is developed to generate multiple patterns having maximum coverage in subproblems at each iteration. In addition to the maximum-margin objective, we propose an alternative objective (minimum-pattern) to solve the LAD problem as a minimum set covering problem. The proposed approaches are evaluated on the datasets from the University of California Irvine Machine Learning Repository. The computational experiments provide comparable performances compared with previous LAD and other well-known classification algorithms.
机译:数据逻辑分析(LAD)是一种基于规则的数据挖掘算法,使用组合优化和布尔逻辑进行二进制分类。目标是构建一个由逻辑模式(规则)组成的分类模型,该逻辑模式从规则中捕获结构化信息。在LAD框架的四个步骤(二进制化,特征选择,模式生成和模型构建)中,模式生成被认为是最重要的步骤。文献中大部分研究了组合枚举方法来生成所有可能的模式。但是,这些方法遭受的是模式生成的计算复杂性,该复杂性随数据(特征)的大小呈指数增长。为了克服这个问题,最近的研究提出了基于列生成的方法,以提高建立具有最大利润率目标的LAD模型的效率。有效解决子问题以生成模式仍然存在困难。在这项研究中,提出了一种新的列生成框架,其中开发了一种新的混合整数线性规划方法,以在每次迭代中生成在子问题中具有最大覆盖率的多个模式。除了最大利润率目标之外,我们还提出了一个替代目标(最小模式)来解决LAD问题,将其作为最小集覆盖问题。在加州大学尔湾分校机器学习存储库的数据集中对提出的方法进行了评估。与以前的LAD和其他众所周知的分类算法相比,计算实验提供了可比的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号