【24h】

ITERATIVE STRUCTURE DISCOVERY IN GRAPH-BASED DATA

机译:基于图形的数据中的迭代结构发现

获取原文
获取原文并翻译 | 示例
           

摘要

Much of current data mining research is focused on discovering sets of attributes that discriminate data entities into classes, such as shopping trends for a particular demographic group. In contrast, we are working to develop data mining techniques to discover patterns consisting of complex relationships between entities. Our research is particularly applicable to domains in which the data is event-driven or relationally structured. In this paper we present approaches to address two related challenges; the need to assimilate incremental data updates and the need to mine monolithic datasets. Many realistic problems are continuous in nature and therefore require a data mining approach that can evolve discovered knowledge over time. Similarly, many problems present data sets that are too large to fit into dynamic memory on conventional computer systems. We address incremental data mining by introducing a mechanism for summarizing discoveries from previous data increments so that the globally-best patterns can be computed by mining only the new data increment. To address monolithic datasets we introduce a technique by which these datasets can be partitioned and mined serially with minimal impact on the result quality. We present applications of our work in both the counter-terrorism and bioinformatics domains.
机译:当前许多数据挖掘研究都集中在发现将数据实体区分为类的属性集,例如特定人口群体的购物趋势。相反,我们正在努力开发数据挖掘技术,以发现由实体之间的复杂关系组成的模式。我们的研究特别适用于数据是事件驱动或相关结构的领域。在本文中,我们提出了应对两个相关挑战的方法。吸收增量数据更新的需求以及挖掘整体数据集的需求。许多现实问题本质上是连续的,因此需要一种数据挖掘方法,该方法可以随着时间的推移发展发现的知识。同样,许多问题提出的数据集太大而无法放入常规计算机系统上的动态内存中。我们通过引入一种汇总以前数据增量发现的机制来解决增量数据挖掘的问题,以便可以通过仅挖掘新数据增量来计算全局最佳模式。为了解决整体数据集,我们引入了一种技术,通过这些技术可以对这些数据集进行分区和连续挖掘,而对结果质量的影响最小。我们介绍了我们的工作在反恐和生物信息学领域的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号