首页> 外文会议>IEEE International Conference on Systems, Man, and Cybernetics >Automated method for extracting “citation sentences” from online biomedical articles using SVM-based text summarization technique
【24h】

Automated method for extracting “citation sentences” from online biomedical articles using SVM-based text summarization technique

机译:使用基于SVM的文本摘要技术从在线生物医学文章中提取“引文句子”的自动化方法

获取原文

摘要

Comment-on (CON), a MEDLINE citation field, indicates previously published articles commented on by authors of a given article expressing possibly complimentary or contradictory opinions. Our idea of identifying the CON list for a given article is to first extract all “citation sentences” from the body text, and then to recognize the sentences (“CON sentences”) among these that mention CON articles and to analyze the corresponding bibliographic data in the reference section. As a preprocessing step for identifying the CON list, this paper presents a general method for extracting “citation sentences” in the body text of online biomedical articles using a support vector machine (SVM)-based text summarization technique. Input feature vectors for the SVM are created by combining four types of features: 1) word statistics representing how differently a word occurs in “citation sentences” compared to other sentences, and the existence of 2) author names, 3) publication years, and 4) citation tags in a sentence. A rule-based post-processing step is also introduced to further reduce false negative errors in detecting “citation sentences”. Experiments on a set of online biomedical articles show that a SVM with a RBF achieves good performance overall in terms of accuracy, precision, recall, and F-measure rates. Our experiments also show that errors in extracting “citation sentences” cause a minor degradation of performance in identifying CON sentences, but can be improved through the proposed rule-based post-processing.
机译:评论(CON)是MEDLINE的引文字段,指示以前发表的文章,该文章由给定文章的作者发表评论,表达了可能相互补充或矛盾的观点。我们确定给定文章的CON列表的想法是,首先从正文中提取所有“引文句子”,然后在提及CON文章的句子中识别句子(“ CON句子”)并分析相应的书目数据在参考部分。作为识别CON列表的预处理步骤,本文提出了一种使用基于支持向量机(SVM)的文本摘要技术提取在线生物医学文章正文中“引文句子”的一般方法。通过组合四种类型的特征来创建SVM的输入特征向量:1)词统计,表示与其他句子相比,“引文句子”中一个词出现的差异,以及2)作者姓名,3)出版年份和4)句子中的引文标签。还引入了基于规则的后处理步骤,以进一步减少检测“引文句子”时的假阴性错误。在一组在线生物医学文章上进行的实验表明,带有RBF的SVM在准确性,准确性,召回率和F测量率方面总体上具有良好的性能。我们的实验还表明,提取“引文句子”时的错误会导致识别CON句子的性能稍有下降,但可以通过提出的基于规则的后处理来加以改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号