首页> 外文期刊>Journal of Quantitative Linguistics >Internet Search Result Probabilities: Heaps' Law and Word Associativity*
【24h】

Internet Search Result Probabilities: Heaps' Law and Word Associativity*

机译:互联网搜索结果的概率:堆定律和单词关联性*

获取原文
获取原文并翻译 | 示例
           

摘要

We study the number of internet search results returned from multi-word queries based on the number of results returned when each word is searched for individually. We derive a model to describe search result values for multi-word queries using the total number of pages indexed by Google and by applying the Zipf power law to the words per page distribution on the internet and Heaps' law for unique word counts. Based on data from 351 word pairs each with exactly one hit when searched for together, and a Zipf law coefficient determined in other studies, we approximate the Heaps' law coefficient for the indexed worldwide web (about 8 billion pages) to be β = 0.52. Previous studies used under 20,000 pages. We demonstrate through examples how the model can be used to analyse automatically the relatedness of word pairs assigning each a value we call “strength of associativity”. We demonstrate the validity of our method with word triplets and through two experiments conducted 8 months apart. We then use our model to compare the index sizes of competing search giants Yahoo and Google.
机译:我们根据每个词分别搜索时返回的结果数,研究从多词查询返回的互联网搜索结果数。我们推导了一个模型,该模型使用Google索引的页面总数并通过对互联网上每页分布的单词应用Zipf幂定律和对唯一单词计数的Heaps定律来描述多单词查询的搜索结果值。根据351个单词对的数据进行搜索,每个单词对在一起时恰好有一个命中,并且在其他研究中确定了Zipf法则系数,我们将被索引的万维网(约80亿页)的Heaps法则系数近似为β= 0.52 。以前的研究使用了20,000页以下的书。我们通过示例演示如何使用该模型自动分析单词对的相关性,为每个单词对分配一个称为“关联强度”的值。我们通过单词三胞胎并通过相隔8个月的两次实验证明了我们方法的有效性。然后,我们使用我们的模型比较竞争搜索巨头雅虎和Google的索引大小。

著录项

  • 来源
    《Journal of Quantitative Linguistics》 |2009年第1期|40-66|共27页
  • 作者单位

    Department of Mathematical Sciences, New Jersey Institute of Technology, USA|Cognitive and Neural Systems Department, Boston University, USA;

    Department of Mathematical Sciences, New Jersey Institute of Technology, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号