首页> 外文OA文献 >Fouille de données par extraction de motifs graduels : contextualisation et enrichissement
【2h】

Fouille de données par extraction de motifs graduels : contextualisation et enrichissement

机译:通过提取渐进模式进行数据挖掘:上下文化和充实

摘要

This thesis's works belongs to the framework of knowledge extraction and data mining applied to numerical or fuzzy data in order to extract linguistic summaries in the form of gradual itemsets: the latter express correlation between attribute values of the form « the more the temperature increases, the more the pressure increases ». Our goal is to contextualize and enrich these gradual itemsets by proposing different types of additional information so as to increase their quality and provide a better interpretation. We propose four types of new itemsets: first of all, reinforced gradual itemsets, in the case of fuzzy data, perform a contextualization by integrating additional attributes linguistically introduced by the expression « all the more ». They can be illustrated by the example « the more the temperature decreases, the more the volume of air decreases, all the more its density increases ». Reinforcement is interpreted as increased validity of the gradual itemset. In addition, we study the extension of the concept of reinforcement to association rules, discussing their possible interpretations and showing their limited contribution. We then propose to process the contradictory itemsets that arise for example in the case of simultaneous extraction of « the more the temperature increases, the more the humidity increases » and « the more the temperature increases, the less the humidity decreases ». To manage these contradictions, we define a constrained variant of the gradual itemset support, which, in particular, does not only depend on the considered itemset, but also on its potential contradictors. We also propose two extraction methods: the first one consists in filtering, after all itemsets have been generated, and the second one integrates the filtering process within the generation step. We introduce characterized gradual itemsets, defined by adding a clause linguistically introduced by the expression « especially if » that can be illustrated by a sentence such as « the more the temperature decreases, the more the humidity decreases, especially if the temperature varies in [0, 10] °C »: the additional clause precise value ranges on which the validity of the itemset is increased. We formalize the quality of this enrichment as a trade-off between two constraints imposed to identified interval, namely a high validity and a high size, as well as an extension taking into account the data density. We propose a method to automatically extract characterized gradual based on appropriate mathematical morphology tools and the definition of an appropriate filter and transcription.
机译:本论文的工作属于知识提取和数据挖掘的框架,该框架应用于数值或模糊数据,以便以渐进项集的形式提取语言摘要:后者表示属性值之间的相关性,即“温度越高,更多压力增加»。我们的目标是通过提出不同类型的附加信息来对这些渐进项集进行语境化和丰富化,以提高其质量并提供更好的解释。我们提出了四种类型的新项目集:首先,在模糊数据的情况下,增强的渐进式项目集通过集成由“更多”表达在语言上引入的其他属性来执行上下文化。它们可以通过示例“温度降低得越多,空气量减少得越多,其密度增加得越多”来说明。增强被解释为渐进项集的有效性增加。此外,我们研究了增强概念对关联规则的扩展,讨论了它们的可能解释并显示了其有限的贡献。然后,我们建议处理例如在同时提取“温度增加越多,湿度增加越多”和“温度增加越多,湿度减少越少”的情况下出现的矛盾项集。为了处理这些矛盾,我们定义了渐进项集支持的受约束变体,该变体尤其不仅取决于所考虑的项集,还取决于其潜在的矛盾者。我们还提出了两种提取方法:第一种是在生成所有项目集之后进行过滤,第二种是在生成步骤中集成过滤过程。我们引入了特征性的渐进项集,这些渐进项集是通过添加由表达式“尤其是»引入的从句中定义的,该子句可以用诸如“«越多,温度下降,湿度下降越多,尤其是如果温度在[0 ,10]°C»:附加子句的精确值范围,可提高项目集的有效性。我们将这种富集的质量形式化为对确定间隔强加的两个约束条件之间的权衡,即高有效性和高大小,以及考虑到数据密度的扩展。我们提出了一种基于适当的数学形态学工具以及适当的过滤器和转录的定义自动提取特征渐变的方法。

著录项

  • 作者

    Oudni Amal;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种 fr
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号