首页> 外文期刊>Journal of Computational Methods in Sciences and Engineering >Fast attribute-based table clustering using Predicate-Trees: A vertical data mining approach
【24h】

Fast attribute-based table clustering using Predicate-Trees: A vertical data mining approach

机译:使用谓词树的基于属性的快速表聚类:一种垂直数据挖掘方法

获取原文
获取原文并翻译 | 示例

摘要

With technological advancements, massive amount of data is being collected in various domains. For instance, since the advent of digital image technology and remote sensing imagery (RSI), NASA and U.S. Geological Survey through the Landsat Data Continuity Mission, has been capturing images of Earth down to 15 meters resolution. Likewise, consider the Internet, where, growth of social media, blog Web sites , etc. generates exponential amount of textual data on a daily basis. Since clustering of data is time-consuming, much of these data is archived even before proper analysis. In this paper, we propose two novel and extremely fast algorithms called imgFAUST or Fast Attribute-based Unsupervised and Supervised Table Clustering for images and a variation called docFAUST for textual data. Both these algorithms are based on Predicate-Trees which are compressed, lossless and data-mining-ready data structures. Without compromising much on the accuracy, our algorithms are fast and can be effectively used in high-speed image data and document analysis.
机译:随着技术的进步,各个领域都在收集大量数据。例如,自从数字图像技术和遥感图像(RSI)问世以来,NASA和Landsat数据连续性特派团通过美国地质调查局一直在捕获分辨率低至15米的地球图像。同样,考虑一下Internet,社交媒体,博客网站等的增长每天都会产生指数级的文本数据。由于数据聚类非常耗时,因此即使在进行适当分析之前,也要归档许多数据。在本文中,我们针对图像提出了两种新颖且极其快速的算法,称为imgFAUST或基于快速属性的无监督和受监督表聚类;对于文本数据,提出了一种称为docFAUST的变体。这两种算法都是基于谓词树的,谓词树是经过压缩的,无损的且可进行数据挖掘的数据结构。在不影响准确性的前提下,我们的算法速度很快,可以有效地用于高速图像数据和文档分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号