首页> 外国专利> A SYSTEM OF CLUSTERING TO MINIMIZE OUTLIERS ON BIG DATA AND METHODS AND USES THEREOF

A SYSTEM OF CLUSTERING TO MINIMIZE OUTLIERS ON BIG DATA AND METHODS AND USES THEREOF

机译：大数据最小化集群聚类系统和方法及其应用

页面导航

摘要
著录项
相似文献

摘要

Document clustering is one the most common method to keep the documents in a systematic way. A clustered document set represents a perfect search environment and major companies involved in data analytics and search behavior, uses different clustering algorithm very often. A lot of clustering algorithms like (K means, Iterative K-Means, Depth Based Outlier Deduction(DBOD) etc. have been tested and utilized for different purposes. A lot of clustering algorithms acts in similar fashion when they are used for small data sets like around 40 to 50 documents. The clustering efficiency of K-Means and Iterative K-Mean is same but as soon as the document size is increased from 50 to 1000, the clustering efficiency of K-means is relatively low as compared to iterative K-means. The negative side of this processing is that the iterative nature takes a lot of time as compared to the K-means due to iterative nature. The proposed algorithm keeps both the point™s time complexity and bulkiness of the data in mind. The proposed algorithm has also considered outliers as a huge problem and in order to optimize the reduction process, clustering has been done in such a manner that the outlier is least. The proposed algorithm is an enhancement of the depth based algorithm by the utilization.

机译：文档聚类是最系统地保留文档的一种方法。集群文档集代表了一个完美的搜索环境，参与数据分析和搜索行为的主要公司经常使用不同的集群算法。许多聚类算法（例如K均值，迭代K均值，基于深度的离群值扣除（DBOD）等）已经过测试并用于不同目的。许多聚类算法在用于小型数据集时的行为方式相似。例如大约40到50个文档。K均值和迭代K均值的聚类效率相同，但是一旦文档大小从50增加到1000，与迭代K相比，K均值的聚类效率相对较低。该方法的不利方面是，由于具有迭代性质，与K-means相比，其迭代性质要花费大量时间，因此该算法同时考虑了点的时间复杂性和数据的庞大性。提出的算法也将离群值视为一个巨大的问题，为了优化还原过程，以离群值最小的方式进行了聚类，该算法是基于深度算法的增强。 hm的利用率。

著录项

公开/公告号IN201711030582A

专利类型
公开/公告日2018-01-26

原文格式PDF
申请/专利权人
展开▼

申请/专利号IN201711030582
发明设计人 KAMALJEET KAUR;ATUL GARG;
展开▼

申请日2017-08-29
分类号G06F19/24;
国家 IN
入库时间 2022-08-21 12:52:13

相似文献

专利
外文文献
中文文献