首页> 外国专利> A SCALABLE SYSTEM FOR CLUSTERING OF LARGE DATABASES HAVING MIXED DATA ATTRIBUTES

A SCALABLE SYSTEM FOR CLUSTERING OF LARGE DATABASES HAVING MIXED DATA ATTRIBUTES

机译：具有混合数据属性的大型数据库的可伸缩系统

页面导航

摘要
著录项
相似文献

摘要

In one exemplary embodiment, the present invention provides a kind of data digging system, project in database or any other data storage medium for searching aggregate of data. The data evaluation select to be probed by M model before starting, and M model in the cluster of the number (potassium) of cluster. Data of the cluster for classifying in the database are divided into each model in K different clusters. Each model of the initial estimation for data distribution is provided to probe into. It rapid access memory that data in so a part of database are read from storage medium and whose is incorporated determines the size of buffer by user or operating system according to available memory resource. For updating each K cluster of primary model data distribution comprehensively M models in the data buffer for the data for being included. Some data belong to a group and summarize or compress and be stored as restoring the data that the data represent sufficient statistic. More data, access are updated from database with model. The undated parameter collection cluster is determined from the data for summarizing the sufficient statistics () of data with newly obtaining. If stopping criteria is evaluated to determine more data needs and reads from database.

机译：在一个示例性实施例中，本发明提供了一种数据挖掘系统，数据库中的项目或用于搜索数据集合的任何其他数据存储介质。数据评估选择要在开始之前通过M模型进行探测，并在簇的数量（钾）中选择M模型。数据库中用于分类的聚类数据被分成K个不同聚类中的每个模型。提供了用于数据分布的初始估计的每个模型以供探索。它快速访问内存，以便从存储介质中读取数据库的一部分数据，并合并其中的数据，由用户或操作系统根据可用的内存资源确定缓冲区的大小。为了全面更新主要模型数据分布的每个K簇，数据缓冲区中的M个模型将被包含的数据。一些数据属于一个组，并汇总或压缩并存储为还原数据，以表示该数据代表足够的统计量。更多数据，访问权限已通过模型从数据库中更新。从数据中确定未日期的参数收集簇，以总结新获得的数据的足够统计量。是否评估了停止条件以确定更多数据需求并从数据库中读取数据。

著录项

公开/公告号EP1090362A4

专利类型
公开/公告日2007-05-02

原文格式PDF
申请/专利权人 MICROSOFT CORPORATION;
展开▼

申请/专利号EP19990914207
发明设计人 BRADLEY PAUL S.;REINA CORY;FAYYAD USAMA;
展开▼

申请日1999-03-29
分类号G06K9/62;G06F17/30;
国家 EP
入库时间 2022-08-21 20:50:12

相似文献

专利
外文文献
中文文献