首页> 外文学位 >The use of Kullback-Leibler divergence in opinion retrieval.
【24h】

The use of Kullback-Leibler divergence in opinion retrieval.

机译:在意见检索中使用Kullback-Leibler分歧。

获取原文
获取原文并翻译 | 示例

摘要

With the huge amount of subjective contents in on-line documents, there is a clear need for an information retrieval system that supports retrieval of documents containing opinions about the topic expressed in a user's query. In recent years, blogs, a new publishing medium, have attracted a large number of people to express personal opinions covering all kinds of topics in response to the real-world events. The opinionated nature of blogs makes them a new interesting research area for opinion retrieval. Identification and extraction of subjective contents from blogs has become the subject of several research projects.;In this thesis, four novel methods are proposed to retrieve blog posts that express opinions about the given topics. The first method utilizes the Kullback-Leibler divergence (KLD) to weight the lexicon of subjective adjectives around query terms. Considering the distances between the query terms and subjective adjectives, the second method uses KLD scores of subjective adjectives based on distances from the query terms for document re-ranking. The third method calculates KLD scores of subjective adjectives for predefined query categories. In the fourth method, collocates, words co-occurring with query terms in the corpus, are used to construct the subjective lexicon automatically. The KLD scores of collocates are then calculated and used for document ranking.;Four groups of experiments are conducted to evaluate the proposed methods on the TREC test collections. The results of the experiments are compared with the baseline systems to determine the effectiveness of using KLD in opinion retrieval. Further studies are recommended to explore more sophisticated approaches to identify subjectivity and promising techniques to extract opinions.
机译:由于在线文档中有大量主观内容,因此显然需要一种信息检索系统,该系统支持检索包含有关用户查询中表达的主题的观点的文档。近年来,博客作为一种新的发布媒体,已经吸引了很多人针对现实世界的事件发表涵盖各种主题的个人见解。博客自以为是的性质使它们成为一个新的有趣的观点检索研究领域。从博客中识别和提取主观内容已成为多个研究项目的主题。本文提出了四种新颖的方法来检索表达对给定主题的观点的博客文章。第一种方法利用Kullback-Leibler散度(KLD)对查询词周围的主观形容词词典进行加权。考虑到查询词与主观形容词之间的距离,第二种方法基于与查询词的距离将主观形容词的KLD分数用于文档重新排名。第三种方法计算预定义查询类别的主观形容词的KLD分数。在第四种方法中,搭配语料库中与查询词共同出现的单词被用来自动构建主观词典。然后计算并列的KLD分数,并将其用于文档排名。;进行了四组实验,以评估TREC测试集上提出的方法。将实验结果与基准系统进行比较,以确定在意见检索中使用KLD的有效性。建议进行进一步的研究,以探索更复杂的方法来识别主观性,并提出有前途的技术来提取意见。

著录项

  • 作者

    Cen, Kun.;

  • 作者单位

    University of Waterloo (Canada).;

  • 授予单位 University of Waterloo (Canada).;
  • 学科 Operations Research.
  • 学位 M.A.Sc.
  • 年度 2008
  • 页码 122 p.
  • 总页数 122
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 运筹学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号