首页> 外文OA文献 >Interestingness Measures for Association Rules in a KDD Process : PostProcessing of Rules with ARQAT Tool
【2h】

Interestingness Measures for Association Rules in a KDD Process : PostProcessing of Rules with ARQAT Tool

机译:KDD过程中关联规则的兴趣度量:使用aRQaT工具对规则进行后处理

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This work takes place in the framework of Knowledge Discovery in Databases (KDD), often called "Data Mining". This domain is both a main research topic and an application ¯eld in companies. KDD aims at discovering previously unknown and useful knowledge in large databases. In the last decade many researches have been published about association rules, which are frequently used in data mining. Association rules, which are implicative tendencies in data, have the advantage to be an unsupervised model. But, in counter part, they often deliver a large number of rules. As a consequence, a postprocessing task is required by the user to help him understand the results. One way to reduce the number of rules - to validate or to select the most interesting ones - is to use interestingness measures adapted to both his/her goals and the dataset studied. Selecting the right interestingness measures is an open problem in KDD. A lot of measures have been proposed to extract the knowledge from large databases and many authors have introduced the interestingness properties for selecting a suitable measure for a given application. Some measures are adequate for some applications but the others are not. In our thesis, we propose to study the set of interestingness measure available in the literature, in order to evaluate their behavior according to the nature of data and the preferences of the user. The ¯nal objective is to guide the user's choice towards the measures best adapted to its needs and in ¯ne to select the most interesting rules. For this purpose, we propose a new approach implemented in a new tool, ARQAT (Association Rule Quality Analysis Tool), in order to facilitate the analysis of the behavior about 40 interest- ingness measures. In addition to elementary statistics, the tool allows a thorough analysis of the correlations between measures using correlation graphs based on the coe±cients suggested by Pear- son, Spearman and Kendall. These graphs are also used to identify the clusters of similar measures. Moreover, we proposed a series of comparative studies on the correlations between interestingness measures on several datasets. We discovered a set of correlations not very sensitive to the nature of the data used, and which we called stable correlations. Finally, 14 graphical and complementary views structured on 5 levels of analysis: ruleset anal- ysis, correlation and clustering analysis, most interesting rules analysis, sensitivity analysis, and comparative analysis are illustrated in order to show the interest of both the exploratory approach and the use of complementary views.
机译:这项工作发生在数据库知识发现(KDD)的框架中,通常称为“数据挖掘”。这个领域既是主要研究课题,又是公司的应用领域。 KDD旨在发现大型数据库中以前未知且有用的知识。在过去的十年中,已经发表了许多有关关联规则的研究,关联规则经常用于数据挖掘中。关联规则是数据的隐含趋势,它具有成为无人监督模型的优势。但是,与此相反,它们通常提供大量规则。结果,用户需要后处理任务来帮助他理解结果。减少规则数量(验证或选择最有趣的规则)的一种方法是使用适合于他/她的目标和研究数据集的兴趣度度量。选择正确的趣味性度量标准是KDD中的一个开放问题。已经提出了许多措施来从大型数据库中提取知识,并且许多作者已经介绍了用于为给定应用选择合适措施的兴趣属性。有些措施足以满足某些应用程序的要求,而其他措施则不行。在本文中,我们建议研究文献中提供的一组趣味性测度,以便根据数据的性质和用户的偏好来评估其行为。最终目标是引导用户选择最适合其需求的措施,并选择最有趣的规则。为此,我们提出了一种在新工具ARQAT(关联规则质量分析工具)中实施的新方法,以促进对40种兴趣度量的行为进行分析。除了基本统计信息外,该工具还可以根据Pearson,Spearman和Kendall建议的系数,使用相关图对度量之间的相关性进行全面分析。这些图还用于识别相似度量的聚类。此外,我们对几个数据集上的兴趣度度量之间的相关性提出了一系列比较研究。我们发现了一组对使用的数据的性质不太敏感的相关性,我们称之为稳定相关性。最后,展示了基于5个分析级别的14种图形视图和补充视图:规则集分析,相关性和聚类分析,最有趣的规则分析,敏感性分析和比较分析,以显示探索性方法和方法的兴趣。使用补充意见。

著录项

  • 作者

    Huynh Xuan-Hiep;

  • 作者单位
  • 年度 2006
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号