首页> 外文期刊>Bioinformatics >Automatic assignment of biomedical categories: toward a generic approach.
【24h】

Automatic assignment of biomedical categories: toward a generic approach.

机译:自动分配生物医学类别:迈向通用方法。

获取原文
获取原文并翻译 | 示例
       

摘要

MOTIVATION: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. METHODS: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. RESULTS AND CONCLUSION: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods.
机译:动机:我们报告了通用文本分类系统的开发情况,该系统旨在将生物医学类别自动分配给任何输入文本。与通常的自动文本分类系统不同,后者依赖于从大量训练数据集中提取的数据密集型模型,而我们的分类器则在很大程度上与数据无关。方法:为了评估我们方法的鲁棒性,我们在两种不同的生物医学术语上测试了该系统:医学主题词(MeSH)和基因本体论(GO)。我们的轻量级分类器基于两个排名模块,结合了模式匹配器和向量空间检索引擎,并同时使用词干和基于语言的索引单元。结果与结论:结果表明,短语索引对于GO和MeSH分类均有效,但我们观察到该工具的分类能力取决于受控词汇:高等级的精度范围从MeSH的90%以上到GO的<20% ,基于检索方法为分类器建立新的基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号