首页> 外国专利> A SYSTEM OF CLUSTERING TO MINIMIZE OUTLIERS ON BIG DATA AND METHODS AND USES THEREOF

A SYSTEM OF CLUSTERING TO MINIMIZE OUTLIERS ON BIG DATA AND METHODS AND USES THEREOF

机译:大数据最小化集群聚类系统和方法及其应用

摘要

Document clustering is one the most common method to keep the documents in a systematic way. A clustered document set represents a perfect search environment and major companies involved in data analytics and search behavior, uses different clustering algorithm very often. A lot of clustering algorithms like (K means, Iterative K-Means, Depth Based Outlier Deduction(DBOD) etc. have been tested and utilized for different purposes. A lot of clustering algorithms acts in similar fashion when they are used for small data sets like around 40 to 50 documents. The clustering efficiency of K-Means and Iterative K-Mean is same but as soon as the document size is increased from 50 to 1000, the clustering efficiency of K-means is relatively low as compared to iterative K-means. The negative side of this processing is that the iterative nature takes a lot of time as compared to the K-means due to iterative nature. The proposed algorithm keeps both the point™s time complexity and bulkiness of the data in mind. The proposed algorithm has also considered outliers as a huge problem and in order to optimize the reduction process, clustering has been done in such a manner that the outlier is least. The proposed algorithm is an enhancement of the depth based algorithm by the utilization.
机译:文档聚类是最系统地保留文档的一种方法。集群文档集代表了一个完美的搜索环境,参与数据分析和搜索行为的主要公司经常使用不同的集群算法。许多聚类算法(例如K均值,迭代K均值,基于深度的离群值扣除(DBOD)等)已经过测试并用于不同目的。许多聚类算法在用于小型数据集时的行为方式相似。例如大约40到50个文档。K均值和迭代K均值的聚类效率相同,但是一旦文档大小从50增加到1000,与迭代K相比,K均值的聚类效率相对较低。该方法的不利方面是,由于具有迭代性质,与K-means相比,其迭代性质要花费大量时间,因此该算法同时考虑了点的时间复杂性和数据的庞大性。提出的算法也将离群值视为一个巨大的问题,为了优化还原过程,以离群值最小的方式进行了聚类,该算法是基于深度算法的增强。 hm的利用率。

著录项

  • 公开/公告号IN201711030582A

    专利类型

  • 公开/公告日2018-01-26

    原文格式PDF

  • 申请/专利权人

    申请/专利号IN201711030582

  • 发明设计人 KAMALJEET KAUR;ATUL GARG;

    申请日2017-08-29

  • 分类号G06F19/24;

  • 国家 IN

  • 入库时间 2022-08-21 12:52:13

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号