首页> 外文期刊>Knowledge and information systems >From sets of good redescriptions to good sets of redescriptions
【24h】

From sets of good redescriptions to good sets of redescriptions

机译:从一套良好的重新质询到好的重新询问

获取原文
获取原文并翻译 | 示例
           

摘要

Redescription mining aims at finding pairs of queries over data variables that describe roughly the same set of observations. These redescriptions can be used to obtain different views on the same set of entities. So far, redescription mining methods have aimed at listing all redescriptions supported by the data. Such an approach can result in many redundant redescriptions and hinder the user's ability to understand the overall characteristics of the data. In this work, we present an approach to identify and remove the redundant redescriptions, that is, an approach to move from a set of good redescriptions to a good set of redescriptions. We measure the redundancy of a redescription using a framework inspired by the concept of subjective interestingness based on maximum entropy distributions as proposed by De Bie (Data Min Knowl Discov 23(3):407-446, 2011). Redescriptions, however, generate specific requirements on the framework, and our solution differs significantly from the existing ones. Notably, our approach can handle disjunctions and conjunctions in the queries, whereas the existing approaches are limited only to conjunctive queries. Our framework can also handle data with Boolean, nominal, or real-valued data, possibly containing missing values, making it applicable to a wide variety of data sets. Our experiments show that our framework can efficiently reduce the redundancy even on large data sets.
机译:Repescription挖掘旨在查找数据变量的对疑问对,这些数据变量大致相同的观察组。这些重新识别可用于在同一组实体上获取不同的视图。到目前为止,Repescription挖掘方法旨在列出数据支持的所有重新质询。这种方法可以导致许多冗余的重新输入并阻碍用户理解数据的整体特征的能力。在这项工作中,我们提出了一种识别和删除冗余重新输入的方法,即从一组良好的重新输入到一组好的重新输入中移动的方法。我们使用由DE BIE提出的最大熵分布的主观兴趣概念的框架来测量重新设计的冗余(数据最小知识23(3):407-446,211)。然而,重新发现生成对框架的特定要求,我们的解决方案与现有的解决方案显着不同。值得注意的是,我们的方法可以处理查询中的剖钉和连词,而现有方法仅限于联合查询。我们的框架还可以用布尔,标称或实际值数据处理数据,可能包含缺失值,使其适用于各种数据集。我们的实验表明,即使在大数据集上,我们的框架也可以有效地降低冗余。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号