From sets of good redescriptions to good sets of redescriptions

Kalofolias Janis; Galbrun Esther; Miettinen Pauli

首页> 外文期刊>Knowledge and information systems >From sets of good redescriptions to good sets of redescriptions

【24h】

From sets of good redescriptions to good sets of redescriptions

机译：从一套良好的重新质询到好的重新询问

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Redescription mining aims at finding pairs of queries over data variables that describe roughly the same set of observations. These redescriptions can be used to obtain different views on the same set of entities. So far, redescription mining methods have aimed at listing all redescriptions supported by the data. Such an approach can result in many redundant redescriptions and hinder the user's ability to understand the overall characteristics of the data. In this work, we present an approach to identify and remove the redundant redescriptions, that is, an approach to move from a set of good redescriptions to a good set of redescriptions. We measure the redundancy of a redescription using a framework inspired by the concept of subjective interestingness based on maximum entropy distributions as proposed by De Bie (Data Min Knowl Discov 23(3):407-446, 2011). Redescriptions, however, generate specific requirements on the framework, and our solution differs significantly from the existing ones. Notably, our approach can handle disjunctions and conjunctions in the queries, whereas the existing approaches are limited only to conjunctive queries. Our framework can also handle data with Boolean, nominal, or real-valued data, possibly containing missing values, making it applicable to a wide variety of data sets. Our experiments show that our framework can efficiently reduce the redundancy even on large data sets.

机译：Repescription挖掘旨在查找数据变量的对疑问对，这些数据变量大致相同的观察组。这些重新识别可用于在同一组实体上获取不同的视图。到目前为止，Repescription挖掘方法旨在列出数据支持的所有重新质询。这种方法可以导致许多冗余的重新输入并阻碍用户理解数据的整体特征的能力。在这项工作中，我们提出了一种识别和删除冗余重新输入的方法，即从一组良好的重新输入到一组好的重新输入中移动的方法。我们使用由DE BIE提出的最大熵分布的主观兴趣概念的框架来测量重新设计的冗余（数据最小知识23（3）：407-446,211）。然而，重新发现生成对框架的特定要求，我们的解决方案与现有的解决方案显着不同。值得注意的是，我们的方法可以处理查询中的剖钉和连词，而现有方法仅限于联合查询。我们的框架还可以用布尔，标称或实际值数据处理数据，可能包含缺失值，使其适用于各种数据集。我们的实验表明，即使在大数据集上，我们的框架也可以有效地降低冗余。

著录项

来源
《Knowledge and information systems》 |2018年第1期|共34页
作者
Kalofolias Janis; Galbrun Esther; Miettinen Pauli;
展开▼
作者单位

Max Planck Inst Informat Saarland Informat Campus Saarbrucken Germany;

Inria Nancy Grand Est Nancy France;

Max Planck Inst Informat Saarland Informat Campus Saarbrucken Germany;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动信息理论;
关键词
Data mining; Redescription mining; Pattern selection; Maximum entropy; Subjective interestingness;

机译：数据挖掘;重新选择挖掘;模式选择;最大熵;主观有趣;

相似文献

外文文献
中文文献
专利

1. From sets of good redescriptions to good sets of redescriptions [J] . Kalofolias Janis, Galbrun Esther, Miettinen Pauli Knowledge and information systems . 2018,第1期

机译：从一套良好的重新质询到好的重新询问
2. Targeted and contextual redescription set exploration [J] . Matej Mihelčić, Tomislav Šmuc Machine Learning . 2018,第11期

机译：针对性和上下文重新定义集探索
3. A framework for redescription set construction [J] . Mihelcic Matej, Dzeroski Saso, Lavrac Nada, Expert Systems with Application . 2017,第feba期

机译：重新定义集构建的框架
4. From Sets of Good Redescriptions to Good Sets of Redescriptions [C] . Janis Kalofolias, Esther Galbrun, Pauli Miettinen IEEE International Conference on Data Mining . 2016

机译：从良好的重新定义集到良好的重新定义集
5. A redescription of the cretaceous marine turtle Ctenochelys acris Zangerl, 1953 and a systematic revision of the 'toxochelyid'-grade taxa using cladistic analysis. [D] . Gentry, Andrew Douglas. 2015

机译：对白垩纪海龟Ctenochelys acri Zangerl的重新描述，1953年，并使用分类分析系统地修订了“ toxochelyid”级分类群。
6. Transgenerational Epigenetic Inheritance Is Revealed as a Multi-stepProcess by Studies of the SET-Domain Proteins SET-25 and SET-32 [O] . Rachel M Woodhouse, Alyson Ashe 2019

机译：跨代表观遗传继承被揭示为一个多步骤通过SET域蛋白SET-25和SET-32的研究进行处理
7. From Sets of Good Redescriptions to Good Sets of Redescriptions [O] . Kalofolias, J., Galbrun, E., Miettinen, P. 2017

机译：从一套好的重新描述到好的一套重新描述

From sets of good redescriptions to good sets of redescriptions

摘要

著录项

相似文献

相关主题

期刊订阅