首页> 外文会议>International conference on natural language processing >Unsupervised Detection and Promotion of Authoritative Domains for Medical Queries in Web Search
【24h】

Unsupervised Detection and Promotion of Authoritative Domains for Medical Queries in Web Search

机译:Web搜索中医学查询的权威域的无监督检测和提升

获取原文

摘要

Medical or Health related search queries constitute a significant portion of the total number of queries searched everyday on the web. For health queries, the authenticity or authoritativeness of search results is of utmost importance besides relevance. So far, research in automatic detection of authoritative sources on the web has mainly focused on - a) link structure based approaches and b) supervised approaches for predicting trustworthiness. However, the aforementioned approaches have some inherent limitations. For example, several content farm and low quality sites artificially boost their link-based authority rankings by forming a syndicate of highly interlinked domains and content which is algorithmically hard to detect. Moreover, the number of positively labeled training samples available for learning trustworthiness is also limited when compared to the size of the web. In this paper, we propose a novel unsupervised approach to detect and promote authoritative domains in health segment using click-through data. We argue that standard IR metrics such as NDCG are relevance-centric and hence are not suitable for evaluating authority. We propose a new authority-centric evaluation metric based on side-by-side judgment of results. Using real world search query sets, we evaluate our approach both quantitatively and qualitatively and show that it succeeds in significantly improving the authoritativeness of results when compared to a standard web ranking baseline.
机译:与医学或健康相关的搜索查询占每天在网络上搜索的查询总数的很大一部分。对于健康查询,除了相关性之外,搜索结果的真实性或权威性也至关重要。迄今为止,在网络上自动检测权威来源的研究主要集中在-a)基于链接结构的方法和b)预测可信度的监督方法。但是,上述方法具有一些固有的局限性。例如,几个内容服务器场和低质量站点通过形成高度互连的域和内容的联合组织,人为地提高了其基于链接的权限等级,而这些联合组织和内容在算法上难以检测。此外,与网络的大小相比,可用于学习可信度的带有正标签的训练样本的数量也受到限制。在本文中,我们提出了一种新颖的无监督方法,即使用点击数据来检测和促进健康领域的权威领域。我们认为NDCG之类的标准IR指标是以关联为中心的,因此不适合评估权威。我们基于对结果的并行判断,提出了一种新的以权威为中心的评估指标。使用现实世界中的搜索查询集,我们在数量和质量上评估了我们的方法,并表明与标准网页排名基准相比,该方法成功地显着提高了结果的权威性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号