...
首页> 外文期刊>Evidence Based Library and Information Practice >Measuring the Extent of the Synonym Problem in Full-Text Searching
【24h】

Measuring the Extent of the Synonym Problem in Full-Text Searching

机译:在全文搜索中测量同义词问题的程度

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Objective – This article measures the extent of the synonym problem in full-text searching. The synonym problem occurs when a search misses documents because the search was based on a synonym and not on a more familiar term. Methods – We considered a sample of 90 single word synonym pairs and searched for each word in the pair, both singly and jointly, in the Yahoo! database. We determined the number of web sites that were missed when only one but not the other term was included in the search field. Results – Depending upon how common the usage is of the synonym, the percentage of missed web sites can vary from almost 0% to almost 100%. When the search uses a very uncommon synonym ("diaconate"), a very high percentage of web pages can be missed (95%), versus the search using the more common term (only 9% are missed when searching web pages for the term "deacons"). If both terms in a word pair were nearly equal in usage ("cooks" and "chefs"), then a search on one term but not the other missed almost half the relevant web pages. Conclusion – Our results indicate great value for search engines to incorporate automatic synonym searching not only for user-specified terms but also for high usage synonyms. Moreover, the results demonstrate the value of information retrieval systems that use controlled vocabularies and cross references to generate search results.
机译:目标–本文测量了全文搜索中同义词问题的程度。当搜索遗漏文档时会出现同义词问题,因为搜索基于同义词而不是更熟悉的术语。方法–我们考虑了90个单字同义词对的样本,并在Yahoo!中单独或联合搜索了该单词对中的每个单词。数据库。我们确定了在搜索字段中仅包含一个术语而不包含另一个术语时丢失的网站数量。结果–根据同义词使用的普遍程度,遗漏网站的百分比可以从几乎0%到几乎100%不等。当搜索使用非常不常见的同义词(“ diaconate”)时,很可能会漏掉非常高比例的网页(95%),而使用更常见的术语会导致搜索失败(搜索网页中的术语时仅漏掉9% “执事”)。如果一个词对中的两个词在用法上几乎相等(“厨师”和“厨师”),则对一个词而不是另一个词的搜索会丢失几乎一半的相关网页。结论–我们的结果表明,对于搜索引擎而言,结合自动同义词搜索不仅对用户指定的术语而且对于高使用率同义词都具有巨大的价值。此外,结果证明了使用受控词汇表和交叉引用生成搜索结果的信息检索系统的价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号