...
首页> 外文期刊>Medical informatics and the Internet in medicine >Biomedical information retrieval across languages.
【24h】

Biomedical information retrieval across languages.

机译:跨语言的生物医学信息检索。

获取原文
获取原文并翻译 | 示例
           

摘要

This work presents a new dictionary-based approach to biomedical cross-language information retrieval (CLIR) that addresses many of the general and domain-specific challenges in current CLIR research. Our method is based on a multilingual lexicon that was generated partly manually and partly automatically, and currently covers six European languages. It contains morphologically meaningful word fragments, termed subwords. Using subwords instead of entire words significantly reduces the number of lexical entries necessary to sufficiently cover a specific language and domain. Mediation between queries and documents is based on these subwords as well as on lists of word-n-grams that are generated from large monolingual corpora and constitute possible translation units. The translations are then sent to a standard Internet search engine. This process makes our approach an effective tool for searching the biomedical content of the World Wide Web in different languages. We evaluate this approach using the OHSUMED corpus, a large medical document collection, within a cross-language retrieval setting.
机译:这项工作提出了一种新的基于字典的生物医学跨语言信息检索(CLIR)方法,该方法解决了当前CLIR研究中的许多一般性和特定领域的挑战。我们的方法基于部分手动和部分自动生成的多语言词典,目前覆盖六种欧洲语言。它包含形态上有意义的单词片段,称为子单词。使用子词代替整个词会大大减少足够覆盖特定语言和领域所必需的词汇条目的数量。查询和文档之间的中介基于这些子词以及从大型单语语料库生成的单词-n-gram列表,并构成可能的翻译单元。然后将翻译发送到标准的Internet搜索引擎。这个过程使我们的方法成为一种有效的工具,可以用不同的语言搜索万维网的生物医学内容。我们在跨语言检索设置中使用OHSUMED语料库(大型医疗文档集)评估这种方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号