...
首页> 外文期刊>ACM Transactions on Information Systems >Detecting and Tracking Topics and Events from Web Search Logs
【24h】

Detecting and Tracking Topics and Events from Web Search Logs

机译:从Web搜索日志中检测和跟踪主题和事件

获取原文
获取原文并翻译 | 示例
           

摘要

Recent years have witnessed increased efforts on detecting topics and events from Web search logs, since this kind of data not only capture web content but also reflect the users' activities. However, the majority of existing work is focused on exploiting clustering techniques for topic and event detection. Due to the huge size and the evolving nature of Web data, existing clustering approaches are limited to meet the realtime demand. To that end, in this article, we propose a method called LETT) to detect evolving topics in a timely manner. Also, we design the techniques to extract events from topics and to infer the evolving relationship among the events. For topic detection, we first provide a measurement to select the important URLs, which are most likely to describe a real-life topic. Then, starting from these selected URLs, we exploit the local expansion method to find other topic-related URLs. Moreover, in the LETD framework, we design algorithms based on Random Walk and Markov Random Fields (MRF), respectively. Because the LETD method exploits a divide-and-conquer strategy to process the data, it is more efficient than existing methods based on clustering techniques. To better illustrate the LETD framework, we develop a demo system StoryTsller which can discover hot topics and events, infer the evolving relationships among events, and visualize information in a storytelling way. This demo system can provide a global view of the topic development and help users target the interesting events more conveniently. Finally, experimental results on real-world Microsoft click-through data have shown that StoryTeller can find real-life hot topics and meaningful evolving relationships among events, and has also demonstrated the efficiency and effectiveness of the LETD method.
机译:近年来,目睹了从Web搜索日志中检测主题和事件的更多工作,因为此类数据不仅可以捕获Web内容,还可以反映用户的活动。但是,现有的大多数工作都集中在利用聚类技术进行主题和事件检测上。由于Web数据的巨大规模和不断发展的性质,现有的群集方法受到限制,无法满足实时需求。为此,在本文中,我们提出了一种称为LETT的方法,用于及时检测不断发展的主题。此外,我们设计了从主题中提取事件并推断事件之间不断发展的关系的技术。对于主题检测,我们首先提供一种选择重要URL的度量,这些URL最有可能描述现实生活中的主题。然后,从这些选定的URL开始,我们利用本地扩展方法来查找其他与主题相关的URL。此外,在LETD框架中,我们分别设计了基于随机游动和马尔可夫随机场(MRF)的算法。因为LETD方法利用分而治之的策略来处理数据,所以它比现有的基于聚类技术的方法更有效。为了更好地说明LETD框架,我们开发了一个演示系统StoryTsller,它可以发现热门话题和事件,推断事件之间不断发展的关系,并以讲故事的方式可视化信息。该演示系统可以提供主题开发的全局视图,并帮助用户更方便地定位有趣的事件。最后,对Microsoft实际点击数据的实验结果表明,StoryTeller可以发现现实生活中的热门话题以及事件之间有意义的不断发展的关系,还证明了LETD方法的效率和有效性。

著录项

  • 来源
    《ACM Transactions on Information Systems》 |2012年第4期|21.1-21.29|共29页
  • 作者单位

    Department of Management Science and Engineering, Tsinghua University, China;

    Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, School of Information, Renmin University of China, China;

    Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, School of Information, Renmin University of China, China;

    Management Science and Information Systems Department, Rutgers, State University of New Jersey;

    Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, School of Information, Renmin University of China, China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    topic detection and tracking; random walk; markov random fields; web search log;

    机译:主题检测和跟踪;随机漫步马可夫随机字段;网络搜索日志;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号