首页> 外文会议>International Conference on Electrical, Computer and Communication Engineering >Locality-sensitive hashing scheme for Bangla news article clustering using bloom filter
【24h】

Locality-sensitive hashing scheme for Bangla news article clustering using bloom filter

机译:孟加拉新闻文章群集使用Bloom Filter的位置敏感散列方案

获取原文

摘要

CXustering mechanism helps to organise a large amount of data items by grouping the similar items into meaningful clusters. A successful clustering approach depends on an effective similarity search algorithm. Similarity search problem for text documents can be turned into a problem domain of sets by using the method called “shingling”. Characteristic matrix of sets is created by searching the shingles in each document. That's why the complexity to build the matrix is significantly high when the dataset is very large in size. Search time can be radically lessened if the characteristic matrix is made by utilizing Bloom Filter which reduces the search time to a constant time. Finding the similarity among all pairs of the set is a major issue since it takes O(n2) time to compare n sets. Locality-sensitive Hashing drastically diminishes the time complexity of searching by generating candidate pairs. Locality-sensitive Hashing focuses the similarity search on candidate pairs that are most likely to be similar. In this paper, the scheme for Bengali news article clustering based on the similarity search by Locality-sensitive Hashing(LSH) is presented.
机译:CXustering机制有助于通过将类似物品分组为有意义的群集来帮助组织大量数据项。成功的聚类方法取决于有效的相似性搜索算法。相似性用于文本文档的搜索问题可以通过使用称为“Shingling”的方法将文本文档的问题变为问题域。通过在每个文档中搜索瓦片来创建集合的特征矩阵。这就是为什么当数据集大小非常大时,构建矩阵的复杂性显着高。如果通过利用绽放过滤器将搜索时间降低到恒定时间,则可以从根本上减小搜索时间。找到所有成对对中的相似性是一个主要问题,因为它需要O(n2)时间来比较n套。地区敏感散列通过生成候选对来大幅度减少搜索的时间复杂性。位置敏感散列重点侧重于最有可能类似的候选对的相似性搜索。本文介绍了基于局部敏感散列(LSH)相似性搜索的孟加拉新闻文章群集的方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号