首页>
外国专利>
A SYSTEM OF CLUSTERING TO MINIMIZE OUTLIERS ON BIG DATA AND METHODS AND USES THEREOF
A SYSTEM OF CLUSTERING TO MINIMIZE OUTLIERS ON BIG DATA AND METHODS AND USES THEREOF
展开▼
机译:大数据最小化集群聚类系统和方法及其应用
展开▼
页面导航
摘要
著录项
相似文献
摘要
Document clustering is one the most common method to keep the documents in a systematic way. A clustered document set represents a perfect search environment and major companies involved in data analytics and search behavior, uses different clustering algorithm very often. A lot of clustering algorithms like (K means, Iterative K-Means, Depth Based Outlier Deduction(DBOD) etc. have been tested and utilized for different purposes. A lot of clustering algorithms acts in similar fashion when they are used for small data sets like around 40 to 50 documents. The clustering efficiency of K-Means and Iterative K-Mean is same but as soon as the document size is increased from 50 to 1000, the clustering efficiency of K-means is relatively low as compared to iterative K-means. The negative side of this processing is that the iterative nature takes a lot of time as compared to the K-means due to iterative nature. The proposed algorithm keeps both the point™s time complexity and bulkiness of the data in mind. The proposed algorithm has also considered outliers as a huge problem and in order to optimize the reduction process, clustering has been done in such a manner that the outlier is least. The proposed algorithm is an enhancement of the depth based algorithm by the utilization.
展开▼