首页> 外文会议>International conference on artificial intelligence: methodology, systems, and applications >Towards Constructing a Corpus for Studying the Effects of Treatments and Substances Reported in PubMed Abstracts
【24h】

Towards Constructing a Corpus for Studying the Effects of Treatments and Substances Reported in PubMed Abstracts

机译:建立研究研究论文和论文收录的疗效的语料库

获取原文

摘要

We present the construction of an annotated corpus of PubMed abstracts reporting about positive, negative or neutral effects of treatments or substances. Our ultimate goal is to annotate one sentence (rationale) for each abstract and to use this resource as a training set for text classification of effects discussed in PubMed abstracts. Currently, the corpus consists of 750 abstracts. We describe the automatic processing that supports the corpus construction, the manual annotation activities and some features of the medical language in the abstracts selected for the annotated corpus. It turns out that recognizing the terminology and the abbreviations is key for determining the rationale sentence. The corpus will be applied to improve our classifier, which currently has accuracy of 78.80% achieved with normalization of the abstract terms based on UMLS concepts from specific semantic groups and an SVM with a linear kernel. Finally, we discuss some other possible applications of this corpus.
机译:我们介绍了带注释的PubMed摘要语料库的构建,该语料库报告了有关治疗或物质的正,负或中性作用。我们的最终目标是为每个摘要注释一个句子(合理值),并将此资源用作对PubMed摘要中讨论的效果进行文本分类的训练集。目前,语料库由750个摘要组成。我们在为被注释的语料库选择的摘要中描述了支持语料库构建的自动处理,手动注释活动以及医学语言的某些功能。事实证明,识别术语和缩写词是确定基本句的关键。该语料库将被用于改进我们的分类器,该分类器通过基于来自特定语义组的UMLS概念和带有线性核的SVM对抽象术语进行归一化,目前具有78.80%的准确性。最后,我们讨论该语料库的其他可能应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号