首页> 外文会议>2013 IEEE International Conference on Big Data >An NML-based model selection criterion for general relational data modeling
【24h】

An NML-based model selection criterion for general relational data modeling

机译:一般关系数据建模的基于NML的模型选择准则

获取原文
获取原文并翻译 | 示例

摘要

Whereas the main interest in most existing data mining approaches has been sequence data on a single type of object, namely attribute data, real-world databases store information about multiple relationships between various classes of objects. The modeling of these general relational data (GRD) plays an important role in eliciting knowledge across multiple relations. It is not reasonable to directly apply existing modeling methods to GRD, because GRD have statistical properties that distinguish them from attribute data. In this paper, we address the issue of statistical model selection in GRD modeling. From the viewpoint of the minimum description length principle, we propose a new model selection criterion by considering the statistical properties of GRD. We employ the normalized maximum likelihood code-length as a model selection criterion, and provide an asymptotic expansion theorem for its application to GRD modeling. To demonstrate its use in a critical application, we apply our proposed criterion to the issue of model selection in relational data clustering. An experiment using artificial datasets demonstrates the effectiveness of our technique compared to other criteria, and we also present a brand analysis using real beer-purchase data.
机译:尽管大多数现有数据挖掘方法的主要兴趣在于单一类型对象的序列数据(即属性数据),但实际数据库存储有关各种类对象之间多种关系的信息。这些通用关系数据(GRD)的建模在跨多种关系获取知识方面起着重要作用。将现有的建模方法直接应用于GRD是不合理的,因为GRD具有将属性与属性数据区分开的统计属性。在本文中,我们解决了GRD建模中统计模型选择的问题。从最小描述长度原则的角度出发,我们考虑了GRD的统计特性,提出了一种新的模型选择准则。我们采用归一化的最大似然码长作为模型选择准则,并为它在GRD建模中的应用提供了渐近展开定理。为了证明其在关键应用中的用途,我们将我们提出的标准应用于关系数据聚类中的模型选择问题。与其他标准相比,使用人工数据集进行的实验证明了我们技术的有效性,并且我们还使用真实的啤酒购买数据进行了品牌分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号