Fast Tweet Retrieval with Compact Binary Codes

机译：快速推文检索，紧凑的二进制代码

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The most widely used similarity measure in the field of natural language processing may be cosine similarity. However, in the context of Twitter, the large scale of massive tweet data inevitably makes it expensive to perform cosine similarity computations among tremendous data samples. In this paper, we exploit binary coding to tackle the scalability issue, which compresses each data sample into a compact binary code and hence enables highly efficient similarity computations via Hamming distances between the generated codes. In order to yield semantics sensitive binary codes for tweet data, we design a binarized matrix factorization model and further improve it in two aspects. First, we force the projection directions employed by the model nearly orthogonal to reduce the redundant information in their resulting binary bits. Second, we leverage the tweets' neighborhood information to encourage similar tweets to have adjacent binary codes. Evaluated on a tweet dataset using hashtags to create gold labels in an information retrieval scenario, our proposed model shows significant performance gains over competing methods.

机译：自然语言处理领域中使用的最广泛使用的相似度测量可能是余弦相似性。然而，在Twitter的背景下，大规模的大规模推文数据不可避免地使得在巨大的数据样本中执行余弦相似性计算昂贵。在本文中，我们利用二进制编码来解决可伸缩性问题，该缩放性问题将每个数据样本压缩到紧凑的二进制代码中，因此通过生成的代码之间的汉明距离实现高效的相似性计算。为了为推特数据产生语义敏感二进制代码，我们设计了二值化矩阵分解模型，并在两个方面进一步改进它。首先，我们强制模型采用的投影方向几乎正交，以减少其产生的二进制比特中的冗余信息。其次，我们利用推文的邻居信息鼓励类似的推文具有相邻的二进制代码。在信息检索方案中使用HASHTAG进行评估，在信息检索方案中创建金标签，我们所提出的模型显示出在竞争方法上的显着性能。

著录项

来源
《International conference on computational linguistics》|2014年||共11页
会议地点
作者
Weiwei Guo; Wei Liu; Mona Diab;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Medical Image Retrieval with Compact Binary Codes Generated in Frequency Domain Using Highly Reactive Convolutional Features [J] . Ahmad Jamil, Muhammad Khan, Baik Sung Wook Journal of medical systems . 2018,第2期

机译：使用高度反应性卷积功能，用频域生成的紧凑型二元代码检索的医学图像检索
2. Fast nearest neighbor retrieval using randomized binary codes and approximate Euclidean distance [J] . Sanparith Marukatat, Ithipan Methasate Pattern recognition letters . 2013,第9期

机译：使用随机二进制码和近似欧几里得距离快速进行最近邻检索
3. Fast Image Search with Pixel-Based Deep Learning Framework via Efficient Compact Binary Code and Addictive Latent Layer [J] . Li Jun Yi, Li Jian Hua International Journal of Pattern Recognition and Artificial Intelligence . 2018,第3期

机译：通过高效的紧凑型二进制代码和上瘾的潜在层，基于像素的深度学习框架实现快速图像搜索
4. Fast Tweet Retrieval with Compact Binary Codes [C] . Weiwei Guo, Wei Liu, Mona Diab International conference on computational linguistics . 2014

机译：使用紧凑的二进制代码进行快速Tweet检索
5. Large-scale image retrieval using similarity preserving binary codes [D] . Gong, Yunchao 2014

机译：使用保留相似性的二进制代码进行大规模图像检索
6. From jamming to fast compaction dynamics in granular binary mixtures [O] . Salvatore Pillitteri, Geoffroy Lumay, Eric Opsomer, -1

机译：从颗粒状二元混合物的堵塞到快速压实动力学
7. Deep learning of binary hash codes for fast image retrieval [O] . Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, 2015

机译：快速图像检索的二元哈希代码深度学习

Fast Tweet Retrieval with Compact Binary Codes

摘要

著录项

相似文献

相关主题

期刊订阅