首页> 外文会议>International Conference on Computational Linguistics and Intelligent Text Processing(CICLing 2005); 20050213-19; Mexico City(MX) >Evaluating Document-to-Document Relevance Based on Document Language Model: Modeling, Implementation and Performance Evaluation
【24h】

Evaluating Document-to-Document Relevance Based on Document Language Model: Modeling, Implementation and Performance Evaluation

机译:基于文档语言模型的文档对文档相关性评估:建模,实现和性能评估

获取原文
获取原文并翻译 | 示例

摘要

To evaluate document-to-document relevance is very important to many advanced applications such as IR, text mining and natural language processing. Since it is very hard to define document relevance in a mathematic way on account of users' uncertainty, the concept of topical relevance is widely accepted by most of research fields. It suggests that a document relevance model should explain whether the document representation describes its topical contents and the matching method reveals the topical differences among the documents. However, the current document-to-document relevance models, such as vector space model, string distance, don't put explicitly emphasis on the perspective of topical relevance. This paper exploits a document language model to represent the document topical content and explains why it can reveal the document topics and then establishes two distributional similarity measure based on the document language model to evaluate document-to-document relevance. The experiment on the TREC testing collection is made to compare it with the vector space model, and the results show that the Kullback-Leibler divergence measure with Jelinek-Mercer smoothing outperforms the vector space model significantly.
机译:评估文档之间的相关性对于许多高级应用程序(例如IR,文本挖掘和自然语言处理)非常重要。由于很难基于用户的不确定性以数学方式定义文档相关性,因此主题相关性的概念已为大多数研究领域所广泛接受。建议文档相关性模型应解释文档表示形式是否描述了其主题内容,而匹配方法则揭示了文档之间的主题差异。但是,当前的文档到文档相关性模型(例如向量空间模型,字符串距离)并未明确强调主题相关性的观点。本文利用文档语言模型来表示文档主题内容,并解释了为什么它可以揭示文档主题,然后基于文档语言模型建立两个分布相似性度量,以评估文档与文档之间的相关性。通过对TREC测试集合进行实验,将其与向量空间模型进行比较,结果表明,采用Jelinek-Mercer平滑的Kullback-Leibler散度测度明显优于向量空间模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号