...
首页> 外文期刊>Studies in Health Technology and Informatics >Grepator: Accents & Case Mix for Thesaurus
【24h】

Grepator: Accents & Case Mix for Thesaurus

机译:Grepator:同义词库的重音符号和案例组合

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

There is a real need among researchers and students for pedagogical resources. In France, information retrieval techniques have been developed, for example in the Doc'CISMeF web site. As Pubmed, documents are indexed with (French) MeSH terms, one of the problems discovered, in quality studies, is the inadequacies between the user requests and the MeSH controlled vocabulary. Moreover, French (but also Greek or Spanish), pose specific problems for indexing, due to the diacritic characters. In this article, we present the Grepator project. The main goal is to transform any thesaurus (or any entry) in case mix and accentuated characters, for a specific domain. Furthermore, Grepator has to complete MeSH terms according to their usual form in natural language and finally, to correct user spelling mistakes. Grepator is based on a statistical approach. A large French medical corpus has been constituted from pedagogical resources indexed in CISMeF. Using regular expressions, Grepator searches the more usual ways to spell the word.. Seventy five percent of MeSH terms are found in the corpus, using this method, with less than one mistake for a hundred words. This first evaluation of the tools is analyzed and we discuss further steps that might be developed.
机译:研究人员和学生中确实需要教学资源。在法国,例如在Doc'CISMeF网站上开发了信息检索技术。在Pubmed中,文档使用(法语)MeSH术语进行索引,在质量研究中发现的问题之一是用户请求和MeSH控制的词汇之间的不足。此外,由于变音符号,法语(也包括希腊语或西班牙语)在索引方面存在特定问题。在本文中,我们介绍了Grepator项目。主要目标是针对特定领域转换大小写混合和强调字符的所有词库(或任何词条)。此外,Grepator必须按照自然语言的惯用格式完成MeSH术语,最后纠正用户的拼写错误。 Grepator基于统计方法。法国大型医学语料库是由CISMeF索引的教学资源组成的。 Grepator使用正则表达式搜索更常用的拼写单词的方法。使用这种方法,在语料库中找到百分之七十五的MeSH术语,而一百个单词的错误少于一个。分析了对工具的首次评估,并讨论了可能要开发的其他步骤。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利