【24h】

Contextual Ranking of Keywords Using Click Data

机译:使用点击数据的关键字的内容相关排名

获取原文

摘要

The problem of automatically extracting the most interesting and relevant keyword phrases in a document has been studied extensively as it is crucial for a number of applications. These applications include contextual advertising, automatic text summarization, and user-centric entity detection systems. All these applications can potentially benefit from a successful solution as it enables computational efficiency (by decreasing the input size), noise reduction, or overall improved user satisfaction.In this paper, we study this problem and focus on improving the overall quality of user-centric entity detection systems. First, we review our concept extraction technique, which relies on search engine query logs. We then define a new feature space to represent the interestingness of concepts, and describe a new approach to estimate their relevancy for a given context. We utilize click through data obtained from a large scale user-centric entity detection system - Contextual Shortcuts - to train a model to rank the extracted concepts, and evaluate the resulting model extensively again based on their click through data. Our results show that the learned model outperforms the baseline model, which employs similar features but whose weights are tuned carefully based on empirical observations, and reduces the error rate from 30.22% to 18.66%.
机译:自动提取文档中最有趣和最相关的关键词短语的问题已得到广泛研究,因为它对许多应用程序至关重要。这些应用程序包括上下文广告,自动文本摘要和以用户为中心的实体检测系统。成功的解决方案可以使所有这些应用程序受益,因为它可以提高计算效率(通过减小输入大小),降低噪声或总体上提高用户满意度。中心实体检测系统。首先,我们回顾我们的概念提取技术,该技术依赖于搜索引擎查询日志。然后,我们定义一个新的特征空间来表示概念的趣味性,并描述一种新方法来估计它们在给定上下文中的相关性。我们利用从大规模以用户为中心的实体检测系统-上下文快捷方式-获得的点击数据来训练模型以对提取的概念进行排名,并根据其点击数据再次广泛评估生成的模型。我们的结果表明,学习的模型优于基线模型,该模型具有相似的功能,但其权重是根据经验观察精心调整的,将错误率从30.22%降低到18.66%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号