首页> 外国专利> Local Causal and Markov Blanket Induction Method for Causal Discovery and Feature Selection from Data

Local Causal and Markov Blanket Induction Method for Causal Discovery and Feature Selection from Data

机译：局部因果和马尔可夫毯形归纳法用于因果发现和特征选择

页面导航

摘要
著录项
相似文献

摘要

In many areas, recent developments have generated very large datasets from which it is desired to extract meaningful relationships between the dataset elements. However, to date, the finding of such relationships using prior art methods has proved extremely difficult especially in the biomedical arts. Methods for local causal learning and Markov blanket discovery are important recent developments in pattern recognition and applied statistics, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. The present invention provides a generative method for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large datasets and relatively small samples. The method is readily applicable to real-world data, and the selected feature sets can be used for causal discovery and classification. The generative method GLL-PC can be instantiated in many ways, giving rise to novel method variants. In general, the inventive method transforms a dataset with many variables into either a minimal reduced dataset where all variables are needed for optimal prediction of the response variable or a dataset where all variables are direct causes and direct effects of the response variable. The power of the invention and significant advantages over the prior art were empirically demonstrated with datasets from a diversity of application domains (biology, medicine, economics, ecology, digit recognition, text categorization, and computational biology) and data generated by Bayesian networks.

机译：在许多领域，最近的发展产生了非常大的数据集，希望从中提取数据集元素之间的有意义的关系。然而，迄今为止，已证明使用现有技术方法发现这种关系非常困难，特别是在生物医学领域。局部因果学习和马尔可夫毯式发现方法是模式识别和应用统计领域的重要最新进展，主要是因为它们为变量/特征选择问题提供了有原则的解决方案，并提供了有关局部因果结构的见识。本发明提供了一种生成方法，用于以直接原因/结果和马尔可夫覆盖的形式学习适用于非常大的数据集和相对较小的样本的感兴趣的目标变量周围的局部因果结构。该方法易于应用于现实世界的数据，并且所选特征集可用于因果发现和分类。生成方法GLL-PC可以通过多种方式实例化，从而产生了新颖的方法变体。一般而言，本发明的方法将具有许多变量的数据集转换为最小化的简化数据集，其中所有变量对于响应变量的最佳预测都是必需的;或者是数据集，其中所有变量都是响应变量的直接原因和直接影响。利用来自各种应用领域（生物学，医学，经济学，生态学，数字识别，文本分类和计算生物学）的数据集以及贝叶斯网络生成的数据，通过经验证明了本发明的力量和相对于现有技术的显着优势。

著录项

公开/公告号US2011307437A1

专利类型
公开/公告日2011-12-15

原文格式PDF
申请/专利权人 KONSTANTINOS (CONSTANTIN) F. ALIFERIS;ALEXANDER STATNIKOV;
展开▼

申请/专利号US20100700689
发明设计人 KONSTANTINOS (CONSTANTIN) F. ALIFERIS;ALEXANDER STATNIKOV;
展开▼

申请日2010-02-04
分类号G06N5/02;
国家 US
入库时间 2022-08-21 17:31:47

相似文献

专利
外文文献
中文文献