Locality-sensitive hashing scheme for Bangla news article clustering using bloom filter

机译：使用Bloom过滤器的Bangla新闻文章聚类的局部敏感哈希方案

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

CXustering mechanism helps to organise a large amount of data items by grouping the similar items into meaningful clusters. A successful clustering approach depends on an effective similarity search algorithm. Similarity search problem for text documents can be turned into a problem domain of sets by using the method called “shingling”. Characteristic matrix of sets is created by searching the shingles in each document. That's why the complexity to build the matrix is significantly high when the dataset is very large in size. Search time can be radically lessened if the characteristic matrix is made by utilizing Bloom Filter which reduces the search time to a constant time. Finding the similarity among all pairs of the set is a major issue since it takes O(n2) time to compare n sets. Locality-sensitive Hashing drastically diminishes the time complexity of searching by generating candidate pairs. Locality-sensitive Hashing focuses the similarity search on candidate pairs that are most likely to be similar. In this paper, the scheme for Bengali news article clustering based on the similarity search by Locality-sensitive Hashing(LSH) is presented.

机译：CXustering机制通过将相似的数据项组合到有意义的群集中来帮助组织大量数据项。成功的聚类方法取决于有效的相似性搜索算法。文本文件的相似性搜索问题可以通过使用“重叠”方法转化为集合的问题域。集合的特征矩阵是通过搜索每个文档中的瓦片创建的。这就是为什么当数据集非常大时，构建矩阵的复杂性非常高的原因。如果通过使用将搜索时间减少到恒定时间的布隆滤波器来制作特征矩阵，则可以从根本上减少搜索时间。在所有集合对之间寻找相似性是一个主要问题，因为比较n个集合需要O（n2）时间。位置敏感的散列通过生成候选对大大降低了搜索的时间复杂度。位置敏感的散列将相似性搜索集中在最可能相似的候选对上。本文提出了一种基于局部敏感哈希（LSH）的相似度搜索的孟加拉新闻文章聚类方案。

著录项

来源
《International Conference on Electrical, Computer and Communication Engineering》|2017年|17-21|共5页
会议地点 Coxs Bazar(BD)
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Decision support systems; Handheld computers;

机译：决策支持系统；手持计算机;

相似文献

外文文献
中文文献
专利

1. Toward more efficient locality-sensitive hashing via constructing novel hash function cluster [J] . Zhang Shi, Huang Jin, Xiao Ruliang, Concurrency and computation: practice and experience . 2021,第20期

机译：通过构建新的哈希函数群集来朝着更有效的地区敏感散列
2. Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search [J] . Qiang Huang, Guihong Ma, Jianlin Feng, SIGKDD explorations . 2018,第Udisk期

机译：用于最大内部产品搜索的准确和快速的不对称位置敏感散列方案
3. Query-aware locality-sensitive hashing scheme for norm [J] . Huang Qiang, Feng Jianlin, Fang Qiong, The VLDB journal . 2017,第5期

机译：规范的查询感知的局部敏感哈希方案
4. Locality-sensitive hashing scheme for Bangla news article clustering using bloom filter [C] . International Conference on Electrical, Computer and Communication Engineering . 2017

机译：孟加拉新闻文章群集使用Bloom Filter的位置敏感散列方案
5. Power and memory efficient hashing schemes for some network applications [D] . Yu, Heeyeol 2009

机译：某些网络应用程序的电源和内存高效哈希方案
6. CONSULT: accurate contamination removal using locality-sensitive hashing [O] . Eleonora Rachtman, Vineet Bafna, Siavash Mirarab 2021

机译：咨询：使用当地敏感散列准确删除污染
7. Locality-sensitive hashing scheme based on dynamic collision counting [O] . Gan J., Feng J., Fang Q., 2012

机译：基于动态冲突计数的局部敏感哈希算法

Locality-sensitive hashing scheme for Bangla news article clustering using bloom filter

摘要

著录项

相似文献

相关主题

期刊订阅