首页> 外文OA文献 >Towards Story Understanding and Search - Web Mining Methods and Tools for Exploration, Search and Discovery
【2h】

Towards Story Understanding and Search - Web Mining Methods and Tools for Exploration, Search and Discovery

机译:走向故事的理解和搜索-用于探索,搜索和发现的Web挖掘方法和工具

摘要

Over the past decade the Internet became one of the leading sources of news content, and using different news provider services available on the Internet has for many people become the main medium for staying informed about the world. Such services support Internet users in interaction with stories. In this thesis, we regard a story as a set of time-stamped documents describing correlated subjects, such as for example persons, event descriptions, and topics. Our particular interest is to investigate the time dimension of stories and particularly story tracking – following a story over time. The goal of different research areas interested in story tracking is to identify and highlight developments – novel and relevant information in a story. In this work we restrict ourselves to news collections and investigate effectiveness and usability of temporal text mining (TTM) story tracking methods.Across the thesis we investigate four areas related to stories: (a) stories and search engines; (b) story tracking methods and tools, (c) story tracking evaluation frameworks, and (d) stories and sources. We formalize these 4 thematic areas into more concrete research questions addressed in this thesis: (Q1) How are search engines affected by story developments? (Q2) Does the semi-automatic story tracking approach we developed enable user comprehension and navigation of stories? (Q3) Can the graph-based patterns extracted by our algorithm be used for story tracking? (Q4) How can different bursty text patterns be used for discovering origins of the changes in document sets? (Q5) How do users interact with interfaces for story tracking? (Q6): How to measure differences between a story across different sources?We start by exploring how search engine users change their behaviour when new developments emerge in a story. For this we investigate a one-year long query log from a leading commercial search engine, and describe the changes of user behaviour correlated with the emergence of new developments. Then, we continue by exploring story tracking methods and tools as means for accommodating for these changes in user behaviour. We propose a new, graph-based, story tracking method and build a tool to support it. Additionally, we investigate the effectiveness of story tracking methods and define a new framework for automatic and user oriented evaluation. Although there are many TTM methods developed, there is a lack of common evaluation procedure. We propose an evaluation framework for measuring how different TTM methods discover novel developments. Apart from the automatic evaluation we are interested in how users interact with patterns and learn about the developments of the story they track. For this we propose a set of metrics and procedures for evaluation of user interfaces in the context of story tracking. To test our tool, we conducted a user study of four interfaces in the context of story tracking. Finally, we look at the source dimension of stories and explore the possible differences in news reporting across different families of news sources,and how to measure them.The results of our analysis show that our method is comparable in performance to other TTM methods, and that it meets the requirements for story tracking. We also show that by leveraging the pattern structure and sentence retrieval TTM methods can help discover developments in the news domain. The user study results show that users have a preference for our tool compared to the rest of the tools used in the study. The results also point out that the tool we built meets a number of the requirements discovered in the query log analysis.
机译:在过去的十年中,Internet成为新闻内容的主要来源之一,并且使用Internet上可用的不同新闻提供者服务已经成为许多人了解世界的主要媒介。此类服务支持Internet用户与故事进行交互。在本文中,我们将故事视为一组带时间戳的文档,这些文档描述了相关的主题,例如人物,事件描述和主题。我们特别感兴趣的是调查故事的时间维度,尤其是故事跟踪-随着时间的推移跟踪故事。对故事跟踪感兴趣的不同研究领域的目标是识别和突出发展-故事中的新颖信息和相关信息。在这项工作中,我们将自己局限于新闻收集,并研究时态文本挖掘(TTM)故事跟踪方法的有效性和可用性。在整个论文中,我们研究了与故事有关的四个领域:(a)故事和搜索引擎; (b)故事追踪方法和工具,(c)故事追踪评估框架,以及(d)故事和来源。我们将这四个主题领域形式化为本文中要解决的更具体的研究问题:(Q1)搜索引擎如何受到故事发展的影响? (第2季度)我们开发的半自动故事跟踪方法是否可以使用户理解和导航故事? (Q3)我们的算法提取的基于图的模式能否用于故事跟踪? (Q4)如何使用不同的突发文本模式来发现文档集中更改的来源? (Q5)用户如何与界面进行交互以进行故事跟踪? (第6季度):如何衡量不同来源之间的故事之间的差异?我们从探索故事中出现新情况时搜索引擎用户如何改变其行为开始。为此,我们调查了来自领先的商业搜索引擎的长达一年的查询日志,并描述了与新发展的出现相关的用户行为的变化。然后,我们继续探索故事跟踪方法和工具,以适应用户行为的这些变化。我们提出了一种新的,基于图的故事跟踪方法,并构建了一个支持它的工具。此外,我们调查了故事跟踪方法的有效性,并定义了自动和面向用户评估的新框架。尽管开发了许多TTM方法,但缺少通用的评估程序。我们提出了一种评估框架,用于衡量不同的TTM方法如何发现新颖的发展。除了自动评估之外,我们还对用户如何与模式交互以及了解他们跟踪的故事的发展感兴趣。为此,我们提出了一组用于在故事跟踪中评估用户界面的度量标准和过程。为了测试我们的工具,我们在故事跟踪的上下文中对四个界面进行了用户研究。最后,我们研究了故事的来源维度,并探讨了不同新闻来源家族之间新闻报道的可能差异,以及如何衡量它们。我们的分析结果表明,我们的方法在性能上可与其他TTM方法相媲美,并且满足故事跟踪的要求。我们还表明,通过利用模式结构和句子检索,TTM方法可以帮助发现新闻领域的发展。用户研究结果表明,与研究中使用的其他工具相比,用户更喜欢我们的工具。结果还指出,我们构建的工具满足了查询日志分析中发现的许多要求。

著录项

  • 作者

    Subasic Ilija;

  • 作者单位
  • 年度 2011
  • 总页数
  • 原文格式 PDF
  • 正文语种 nl
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号