首页> 外文期刊>Language Resources and Evaluation >Morphological query expansion and language-filtering words for improving Basque web retrieval
【24h】

Morphological query expansion and language-filtering words for improving Basque web retrieval

机译:形态查询扩展和语言过滤词,以改善巴斯克语网络检索

获取原文
获取原文并翻译 | 示例

摘要

The experience of a user of major search engines or other web information retrieval services looking for information in the Basque language is far from satisfactory: they only return pages with exact matches but no inflections (necessary for an agglutinative language like Basque), many results in other languages (no search engine gives the option to restrict its results to Basque), etc. This paper proposes using morphological query expansion and language-filtering words in combination with the APIs of search engines as a very cost-effective solution to build appropriate web search services for Basque. The implementation details of the methodology (choosing the most appropriate language-filtering words, the number of them, the most frequent inflections for the morphological query expansion, etc.) have been specified by corpora-based studies. The improvements produced have been measured in terms of precision and recall both over corpora and real web searches. Morphological query expansion can improve recall up to 47 % and language-filtering words can raise precision from 15 % to around 90 %, although with a loss in recall of about 30-35 %. The proposed methodology has already been successfully used in the Basque search service Elebila and the web-as-corpus tool CorpEus, and the approach could be applied to other morphologically rich or under-resourced languages as well.
机译:大型搜索引擎或其他Web信息检索服务的用户以巴斯克语言查找信息的体验远不能令人满意:他们只返回具有完全匹配项的页面,却没有任何变形(对于像巴斯克这样的粘性语言来说是必需的),结果很多其他语言(没有搜索引擎提供将其结果限制为巴斯克语的选项),等等。本文提出将形态查询扩展和语言过滤词与搜索引擎的API结合使用,作为构建合适网站的非常经济有效的解决方案巴斯克搜索服务。该方法的实现细节(选择最合适的语言过滤单词,数量,词法查询扩展最频繁的变形等)已由基于语料库的研究指定。所产生的改进已根据语料库和实际网络搜索的准确性和召回率进行了衡量。形态查询扩展最多可将召回率提高47%,过滤语言的单词可将精度从15%提高到90%左右,尽管召回损失约为30-35%。所提出的方法已经在巴斯克语搜索服务Elebila和基于语料库的工具CorpEus中成功使用,该方法也可以应用于其他形态丰富或资源不足的语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号