首页> 外文会议>International Conference on Electrical, Computer and Communication Engineering >Locality-sensitive hashing scheme for Bangla news article clustering using bloom filter
【24h】

Locality-sensitive hashing scheme for Bangla news article clustering using bloom filter

机译:使用Bloom过滤器的Bangla新闻文章聚类的局部敏感哈希方案

获取原文
获取原文并翻译 | 示例

摘要

CXustering mechanism helps to organise a large amount of data items by grouping the similar items into meaningful clusters. A successful clustering approach depends on an effective similarity search algorithm. Similarity search problem for text documents can be turned into a problem domain of sets by using the method called “shingling”. Characteristic matrix of sets is created by searching the shingles in each document. That's why the complexity to build the matrix is significantly high when the dataset is very large in size. Search time can be radically lessened if the characteristic matrix is made by utilizing Bloom Filter which reduces the search time to a constant time. Finding the similarity among all pairs of the set is a major issue since it takes O(n2) time to compare n sets. Locality-sensitive Hashing drastically diminishes the time complexity of searching by generating candidate pairs. Locality-sensitive Hashing focuses the similarity search on candidate pairs that are most likely to be similar. In this paper, the scheme for Bengali news article clustering based on the similarity search by Locality-sensitive Hashing(LSH) is presented.
机译:CXustering机制通过将相似的数据项组合到有意义的群集中来帮助组织大量数据项。成功的聚类方法取决于有效的相似性搜索算法。文本文件的相似性搜索问题可以通过使用“重叠”方法转化为集合的问题域。集合的特征矩阵是通过搜索每个文档中的瓦片创建的。这就是为什么当数据集非常大时,构建矩阵的复杂性非常高的原因。如果通过使用将搜索时间减少到恒定时间的布隆滤波器来制作特征矩阵,则可以从根本上减少搜索时间。在所有集合对之间寻找相似性是一个主要问题,因为比较n个集合需要O(n2)时间。位置敏感的散列通过生成候选对大大降低了搜索的时间复杂度。位置敏感的散列将相似性搜索集中在最可能相似的候选对上。本文提出了一种基于局部敏感哈希(LSH)的相似度搜索的孟加拉新闻文章聚类方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号