...
【24h】

SAMA: A TWITTER BASED WEB SEARCH ENGINE

机译:SAMA:基于推特的Web搜索引擎

获取原文
   

获取外文期刊封面封底 >>

       

摘要

How can a model efficiently identify relevant references in the hundreds of millions of Twitter messages that are posted every day? In this paper, we intend to address this fundamental research question, as well as introduce SAMA, a scalable search model that uses Twitter streams. Real-time topic detection is an important function for all search engines, and extracting topics from Twitter raises new challenges. As a huge temporal data flow, Twitter has many various types of topics, as well as a lot of noise. Current sophisticated search engines with high computational complexity are not designed to handle such large data flows efficiently. Twitter provides many opportunities for people to engage with real-time world events through communication and information sharing, as well as tools for dealing with its data. However, little is understood about the external links available in Twitter content, and this affects topic engagement. As of today, Twitter posts and its external links is very limited using upon traditional search engine despite the fact that content of micro-blogging presented by Twitter is very curious and useful for some queries rather than content of traditional Webs. In this paper, we propose a platform for modeling URL and inverse message frequencies and Twitter external references, which allows us to use a novel self-content detection algorithm for link authorities. Our model can make use of a new source of Web references, and experiments verify the effectiveness of the model in real time topic detection of Twitter social content. In our evaluations, we investigate the impact of different features on retrieval performance, and highlight tweet features that have high precision for both adhoc and diversity tasks: 77% and 78% respectively.
机译:模型如何有效地识别每天发布的数亿条Twitter消息中的相关参考?在本文中,我们打算解决这个基础研究问题,并介绍SAMA,这是一种使用Twitter流的可扩展搜索模型。实时主题检测是所有搜索引擎的重要功能,从Twitter提取主题会带来新的挑战。作为一个巨大的时间数据流,Twitter具有许多不同类型的主题以及很多杂音。当前具有高计算复杂度的复杂搜索引擎并未设计为有效处理如此大的数据流。 Twitter通过交流和信息共享以及用于处理其数据的工具,为人们提供了许多参与实时世界事件的机会。但是,对于Twitter内容中可用的外部链接了解得很少,这会影响主题参与度。到今天为止,Twitter帖子及其外部链接在传统搜索引擎上的使用非常有限,尽管事实是,Twitter所提供的微博客的内容对于某些查询而非传统Web的内容非常好奇且有用。在本文中,我们提出了一个用于对URL和反向消息频率以及Twitter外部引用进行建模的平台,该平台使我们能够为链接权限使用一种新颖的自我内容检测算法。我们的模型可以利用Web引用的新来源,并且实验验证了该模型在Twitter社交内容的实时主题检测中的有效性。在我们的评估中,我们调查了不同功能对检索性能的影响,并突出显示了适用于临时任务和多样性任务的高精度推文功能:分别为77%和78%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号