Efficient Search Methods for Statistical Dependency Rules

Wilhelmiina Hamalainen

首页> 外文期刊>Fundamenta Informaticae >Efficient Search Methods for Statistical Dependency Rules

【24h】

Efficient Search Methods for Statistical Dependency Rules

机译：统计相关性规则的有效搜索方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Dependency analysis is one of the central problems in bioinformatics and all empirical science. In genetics, for example, an important problem is to find which gene alleles are mutually dependent or which alleles and diseases are dependent. In ecology, a similar problem is to find dependencies between different species or groups of species. In both cases a classical solution is to consider all pairwise dependencies between single attributes and evaluate the relationships with some statistical measure like the x2-measure. It is known that the actual dependency structures can involve more attributes, but the existing computational methods are too inefficient for such an exhaustive search.In this paper, we introduce efficient search methods for positive dependencies of the form X —> A with typical statistical measures. The efficiency is achieved by a special kind of a branch-and-bound search which also prunes out redundant rules. Redundant attributes are especially harmful in dependency analysis, because they can blur the actual dependencies and even lead to erroneous conclusions.We consider two alternative definitions of redundancy: the classical one and a stricter one. We improve our previous algorithm for searching for the best strictly non-redundant dependency rules and introduce a totally new algorithm for searching for the best classically non-redundant rules. According to our experiments, both algorithms can prune the search space very efficiently, and in practice no minimum frequency thresholds are needed. This is an important benefit, because biological data sets are typically dense, and the alternative search methods would require too large minimum frequency thresholds for any practical purpose.

机译：依赖性分析是生物信息学和所有经验科学中的核心问题之一。例如，在遗传学中，一个重要的问题是找出哪些基因等位基因是相互依赖的，或者哪些等位基因和疾病是依赖的。在生态学中，类似的问题是要找到不同物种或物种组之间的依存关系。在这两种情况下，经典的解决方案都是考虑单个属性之间的所有成对依赖关系，并使用x2度量之类的统计度量来评估关系。众所周知，实际的依存关系结构可能包含更多属性，但是现有的计算方法对于这种详尽的搜索而言效率太低。本文针对典型的统计量度，介绍了X-> A形式的正依存关系的有效搜索方法。。这种效率是通过特殊的分支和边界搜索来实现的，该搜索还修剪了多余的规则。冗余属性在依赖关系分析中尤其有害，因为它们会模糊实际的依赖关系，甚至导致错误的结论。我们考虑冗余的两种定义：经典的和严格的。我们改进了以前的算法，以搜索最佳的严格非冗余依赖规则，并引入了一种全新的算法，以搜索最佳的经典非冗余规则。根据我们的实验，这两种算法都可以非常有效地修剪搜索空间，并且实际上不需要最低频率阈值。这是一个重要的好处，因为生物数据集通常很密集，并且替代搜索方法出于任何实际目的都将需要太大的最小频率阈值。

著录项

来源
《Fundamenta Informaticae》 |2011年第2期|p.117-150|共34页
作者
Wilhelmiina Hamalainen;
展开▼
作者单位

Department of Biosciences University of Eastern Finland Kuopio, Finland;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
statistical dependence; redundancy; x~2-measure; z-score; search algorithms;

机译：统计依赖性;冗余;x〜2-小节;z得分;搜索算法;

相似文献

外文文献
中文文献
专利

1. Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures [J] . Wilhelmiina Hamalainen Knowledge and information systems . 2012,第2期

机译：翠鸟：一种有效的算法，用于搜索具有统计显着性度量值的正负依赖规则
2. Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures [J] . Wilhelmiina Hämäläinen Knowledge and Information Systems . 2012,第2期

机译：翠鸟：一种有效的算法，用于搜索具有统计显着性度量值的正负依赖规则
3. Rule Based Method In Entity Resolution For Efficient Web Search [J] . R.Kalpanadevi@soumiya, K. Baskar, A. Kumarasan, Advances in Natural and Applied Sciences . 2017,第6期

机译：实体解析中基于规则的有效Web搜索方法
4. A novel Naxi dependency parsing method based on rules and statistics [C] . Su Meng, Yu Zhengtao, Gao Shengxiang, International Conference on Fuzzy Systems and Knowledge Discovery . 2014

机译：基于规则和统计的纳西族依赖关系解析新方法
5. Perceptual grouping selection rules in visual search: Methods of sub-group selection in multiple-target visual search tasks. [D] . King, Robert Arthur. 2003

机译：视觉搜索中的感知分组选择规则：多目标视觉搜索任务中的子组选择方法。
6. Statistical Methods for Establishing Personalized Treatment Rules in Oncology [O] . Junsheng Ma, Brian P. Hobbs, Francesco C. Stingo -1

机译：建立个性化肿瘤治疗规则的统计方法
7. Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures [O] . Wilhelmiina Hämäläinen 2012

机译：翠鸟：一种有效的算法，用于搜索具有统计显着性度量值的正负依赖规则
8. Methods for Dependency Estimation and System Unavailability Evaluation Based on Failure Data Statistics. Detailed Description and Applications [R] . Azarm, MA, Hsu, F, Martinez-Guridi, G, 1993

机译：基于故障数据统计的依赖性估计和系统不可用性评估方法。详细说明和应用

Efficient Search Methods for Statistical Dependency Rules

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅