首页> 外文期刊>Fundamenta Informaticae >Efficient Search Methods for Statistical Dependency Rules
【24h】

Efficient Search Methods for Statistical Dependency Rules

机译:统计相关性规则的有效搜索方法

获取原文
获取原文并翻译 | 示例

摘要

Dependency analysis is one of the central problems in bioinformatics and all empirical science. In genetics, for example, an important problem is to find which gene alleles are mutually dependent or which alleles and diseases are dependent. In ecology, a similar problem is to find dependencies between different species or groups of species. In both cases a classical solution is to consider all pairwise dependencies between single attributes and evaluate the relationships with some statistical measure like the x2-measure. It is known that the actual dependency structures can involve more attributes, but the existing computational methods are too inefficient for such an exhaustive search.In this paper, we introduce efficient search methods for positive dependencies of the form X —> A with typical statistical measures. The efficiency is achieved by a special kind of a branch-and-bound search which also prunes out redundant rules. Redundant attributes are especially harmful in dependency analysis, because they can blur the actual dependencies and even lead to erroneous conclusions.We consider two alternative definitions of redundancy: the classical one and a stricter one. We improve our previous algorithm for searching for the best strictly non-redundant dependency rules and introduce a totally new algorithm for searching for the best classically non-redundant rules. According to our experiments, both algorithms can prune the search space very efficiently, and in practice no minimum frequency thresholds are needed. This is an important benefit, because biological data sets are typically dense, and the alternative search methods would require too large minimum frequency thresholds for any practical purpose.
机译:依赖性分析是生物信息学和所有经验科学中的核心问题之一。例如,在遗传学中,一个重要的问题是找出哪些基因等位基因是相互依赖的,或者哪些等位基因和疾病是依赖的。在生态学中,类似的问题是要找到不同物种或物种组之间的依存关系。在这两种情况下,经典的解决方案都是考虑单个属性之间的所有成对依赖关系,并使用x2度量之类的统计度量来评估关系。众所周知,实际的依存关系结构可能包含更多属性,但是现有的计算方法对于这种详尽的搜索而言效率太低。本文针对典型的统计量度,介绍了X-> A形式的正依存关系的有效搜索方法。 。这种效率是通过特殊的分支和边界搜索来实现的,该搜索还修剪了多余的规则。冗余属性在依赖关系分析中尤其有害,因为它们会模糊实际的依赖关系,甚至导致错误的结论。我们考虑冗余的两种定义:经典的和严格的。我们改进了以前的算法,以搜索最佳的严格非冗余依赖规则,并引入了一种全新的算法,以搜索最佳的经典非冗余规则。根据我们的实验,这两种算法都可以非常有效地修剪搜索空间,并且实际上不需要最低频率阈值。这是一个重要的好处,因为生物数据集通常很密集,并且替代搜索方法出于任何实际目的都将需要太大的最小频率阈值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号