首页> 外文学位 >Querying, Exploring and Mining the Extended Document.
【24h】

Querying, Exploring and Mining the Extended Document.

机译:查询,浏览和挖掘扩展文档。

获取原文
获取原文并翻译 | 示例

摘要

The evolution of the Web into an interactive medium that encourages active user engagement has ignited a huge increase in the amount, complexity and diversity of available textual data. This evolution forces us to reevaluate our view of documents as simple pieces of text and of document collections as immutable and isolated. Extended documents published in the context of blogs, micro-blogs, on-line social networks, customer feedback portals, can be associated with a wealth of meta-data in addition to their textual component: tags, links, sentiment, entities mentioned in text, etc. Collections of user-generated documents grow, evolve, co-exist and interact: they are dynamic and integrated.;For collections of socially annotated extended documents, we present an improved probabilistic search and ranking approach based on our growing understanding of the dynamics of the social annotation process.;For extended documents, such as blog posts, associated with entities extracted from text and categorical attributes, we enable their interactive exploration through the efficient computation of strong entity associations. Associated entities are computed for all possible attribute value restrictions of the document collection.;For extended documents, such as user reviews, annotated with a numerical rating, we introduce a keyword-query refinement approach. The solution enables the interactive navigation and exploration of large result sets.;These unique characteristics of modern documents and document collections present us with exciting opportunities for improving the way we interact with them. At the same time, this additional complexity combined with the vast amounts of available textual data present us with formidable computational challenges. In this context, we introduce, study and extensively evaluate an array of effective and efficient solutions for querying, exploring and mining extended documents, dynamic and integrated document collections.;We extend the skyline query to document streams, such as news articles, associated with categorical attributes and partially-ordered domains. The technique incrementally maintains a small set of recent, uniquely interesting extended documents from the stream.;Finally, we introduce a solution for the scalable integration of structured data sources into Web search. Queries are analyzed in order to determine what structured data, if any, should be used to augment Web search results.
机译:Web演变为鼓励积极的用户参与的交互式媒体,已经点燃了可用文本数据的数量,复杂性和多样性的巨大增长。这种演变迫使我们重新评估我们对文档的看法,认为它们是简单的文本,而文档集合则是不可变的和孤立的。在博客,微博客,在线社交网络,客户反馈门户等上下文中发布的扩展文档,除了其文本部分外,还可以与大量元数据相关联:标签,链接,情感,文本中提及的实体用户生成的文档的集合不断增长,发展,共存和交互:它们是动态的和集成的。对于社会化注释的扩展文档的集合,我们基于对文本的不断增长的理解,提出了一种改进的概率搜索和排名方法社会注释过程的动态。对于与从文本和类别属性中提取的实体相关联的扩展文档(例如博客帖子),我们通过有效计算强实体关联来启用其交互式探索。为文档集合的所有可能的属性值限制计算关联实体。;对于扩展的文档(例如,带有数字评分的用户评论),我们引入了关键字查询优化方法。该解决方案使交互式导航和浏览大型结果集成为可能。现代文档和文档集合的这些独特特性为我们提供了令人兴奋的机会,可以改善我们与它们之间的交互方式。同时,这种额外的复杂性与大量可用的文本数据相结合,给我们带来了巨大的计算挑战。在这种情况下,我们引入,研究和广泛评估了一系列有效,高效的解决方案,用于查询,探索和挖掘扩展文档,动态和集成的文档集合。;我们将天际线查询扩展到与以下内容相关的文档流,例如新闻文章分类属性和部分排序的域。该技术从流中增量地维护了一小组最近的,独特的,有趣的扩展文档。最后,我们引入了一种解决方案,用于将结构化数据源可伸缩地集成到Web搜索中。对查询进行分析,以确定应使用哪些结构化数据(如果有)来增强Web搜索结果。

著录项

  • 作者

    Sarkas, Nikolaos.;

  • 作者单位

    University of Toronto (Canada).;

  • 授予单位 University of Toronto (Canada).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 204 p.
  • 总页数 204
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号