...
首页> 外文期刊>Engineering Applications of Artificial Intelligence >Scalability achievements for enumerative biclustering with online partitioning: Case studies involving mixed-attribute datasets
【24h】

Scalability achievements for enumerative biclustering with online partitioning: Case studies involving mixed-attribute datasets

机译:具有在线分区的枚举Biclustering的可扩展性成就:涉及混合属性数据集的案例研究

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Biclustering is a powerful data analysis technique and its concept is appealing in many domains, such as natural sciences and market basket analysis. To exemplify the wide range of biclustering applications, we can also mention recommender systems, educational data mining, emerging topic detection and counterfeit product detection. In this paper, we further extend RIn-Close.CVC, a biclustering algorithm capable of performing, in numerical datasets, an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns. By avoiding a priori partitioning and itemization of the dataset, RIn-Close_CVC implements an online partitioning, which is demonstrated here to guide to more informative biclustering results. The improved algorithm, called RIn-Close_CVC3, is characterized by: a drastic reduction in memory usage; a consistent gain in runtime; additional ability to handle datasets with missing values; and new skills to operate with attributes characterized by distinct distributions or even mixed data types. Moreover, RIn-Close_CVC3 keeps those four attractive properties of RIn-Close_CVC, as formally proved here. The experimental results include synthetic and real-world datasets used to perform scalability and sensitivity analyses, besides a comparative inquiry involving a priori and online partitioning. As a practical case study, a parsimonious set of relevant and interpretable mixed-attribute-type rules is obtained in the context of supervised descriptive pattern mining.
机译:Biclustering是一种强大的数据分析技术,其概念在许多域中吸引人,如自然科学和市场篮子分析。为了举例说明广泛的双板应用应用,我们还可以提及推荐系统,教育数据挖掘,新兴主题检测和假冒产品检测。在本文中,我们进一步扩展了rin-close.cvc,一种能够在数值数据集中执行的BICLUSTING算法,在列中具有恒定值的最大平板的高效,完整,正确和非冗余枚举。通过避免数据集的先验划分和逐项化,Rin-Close_CVC实现了在线分区,该在线分区将在此进行指导,以指导更丰富的Biclustering结果。称为rin-close_cvc3的改进算法特征在于:内存使用量急剧下降;运行时一致的增益;处理具有缺失值的数据集的额外功能;和新技能与特征的属性一起运行,其分布甚至混合数据类型。此外,RIN-CLOSE_CVC3将这四个有吸引力的rin-close_cvc属性保持在这里。除了涉及先验和在线分区的比较查询之外,实验结果包括用于执行可扩展性和敏感性分析的合成和实际数据集。作为一个实际的案例研究,在监督描述模式挖掘的背景下获得了一系列关于相关和可解释的混合属性型规则。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号