首页> 外文期刊>ACM transactions on the web >Assessing Relevance and Trust of the Deep Web Sources and Results Based on Inter-Source Agreement
【24h】

Assessing Relevance and Trust of the Deep Web Sources and Results Based on Inter-Source Agreement

机译:基于源间协议评估深层Web源和结果的相关性和信任度

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Deep web search engines face the formidable challenge of retrieving high-quality results from the vast collection of searchable databases. Deep web search is a two-step process of selecting the high-quality sources and ranking the results from the selected sources. Though there are existing methods for both the steps, they assess the relevance of the sources and the results using the query-result similarity. When applied to the deep web these methods have two deficiencies. First is that they are agnostic to the correctness (trustworthiness) of the results. Second, the query-based relevance does not consider the importance of the results and sources. These two considerations are essential for the deep web and open collections in general. Since a number of deep web sources provide answers to any query, we conjuncture that the agreements between these answers are helpful in assessing the importance and the trustworthiness of the sources and the results. For assessing source quality, we compute the agreement between the sources as the agreement of the answers returned. While computing the agreement, we also measure and compensate for the possible collusion between the sources. This adjusted agreement is modeled as a graph with sources at the vertices. On this agreement graph, a quality score of a source, that we call SourceRank, is calculated as the stationary visit probability of a random walk. For ranking results, we analyze the second-order agreement between the results. Further extending SourceRank to multidomain search, we propose a source ranking sensitive to the query domains. Multiple domain-specific rankings of a source are computed, and these ranks are combined for the final ranking. We perform extensive evaluations on online and hundreds of Google Base sources spanning across domains. The proposed result and source rankings are implemented in the deep web search engine Factal. We demonstrate that the agreement analysis tracks source corruption. Further, our relevance evaluations show that our methods improve precision significantly over Google Base and the other baseline methods. The result ranking and the domain-specific source ranking are evaluated separately.
机译:深度网络搜索引擎面临着巨大的挑战,即要从大量可搜索的数据库中检索高质量的结果。深度网络搜索是一个两步过程,即选择高质量的源并对来自所选源的结果进行排名。尽管这两个步骤都有现有的方法,但是它们使用查询结果相似性来评估源和结果的相关性。当应用于深层网络时,这些方法有两个缺陷。首先,它们与结果的正确性(可信度)无关。其次,基于查询的相关性不考虑结果和来源的重要性。这两个注意事项通常对于深度网络和开放收藏至关重要。由于许多深层网络资源提供了对任何查询的答案,因此我们认为,这些答案之间的协议有助于评估资源和结果的重要性和可信赖性。为了评估来源质量,我们计算来源之间的一致性,作为返回答案的一致性。在计算协议时,我们还测量并补偿源之间可能的串通。将该调整后的协议建模为在顶点处具有源的图形。在此一致性图上,将源的质量得分(我们称为SourceRank)计算为随机游动的静态访问概率。对于结果排名,我们分析结果之间的二阶一致性。进一步将SourceRank扩展到多域搜索,我们提出了对查询域敏感的源排名。计算源的多个特定于域的排名,并将这些排名合并以得出最终排名。我们对在线和跨域的数百个Google Base来源进行了广泛的评估。建议的结果和来源排名在深度网络搜索引擎Factal中实现。我们证明协议分析可以跟踪源腐败。此外,我们的相关性评估表明,与Google Base和其他基准方法相比,我们的方法显着提高了精度。结果排名和特定领域的源排名分别进行评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号