首页> 外文会议>International Conference on Artificial Intelligence >Machine Learning with Selective Word Statistics for Automated Classification of Citation Subjectivity in Online Biomedical Articles
【24h】

Machine Learning with Selective Word Statistics for Automated Classification of Citation Subjectivity in Online Biomedical Articles

机译:机器学习与在线生物医学文章中的引文主体自动分类的选择性词汇统计

获取原文

摘要

There is growing interest in automatically classifying author's sentiment expressed within citation sentences in scientific literature to provide effective tools for researchers who are seeking relevant previous work or approaches for a certain research purpose. We propose an automated method of determining whether a given citation sentence contains an author's subjective opinion (positive or negative) or objective factual information, as the first step to analyze and identify the citing author's sentiments toward the cited external sources. Our method uses a support vector machine (SVM)-based text categorization technique to identify the subjective citations specifically toward Comment-on (CON) articles. CON, a MEDLINE citation field, indicates previously published articles commented on by authors of a given article expressing possibly complimentary or contradictory opinions. We introduce a bag of unigrams based on selective word statistics, which is derived from a text region of interest within a sentence containing a description of author's reason of citation and lexical linguistic cues to build an input feature vector for the SVM classifier. Experiments conducted on a set of CON sentences collected from 414 different online biomedical journal titles show that the SVM classifier yields a comparable result for the proposed a bag of unigrams input feature selectively extracted from a text of interest, compared to another bag of unigrams from the entire sentence. Moreover, we achieve a significant performance boost of the SVM with an input feature vector combining two types of statistical bag of unigrams and sentiment word lexicon.
机译:在科学文献中的引文句子中自动追查作者的情感日益增长的感兴趣,为正在寻求某种研究目的的相关工作或方法的研究人员提供有效的工具。我们提出了一种自动化方法,即确定给定的引文判决是否包含作者的主观意见(积极或负面)或客观事实信息,作为分析和识别引用作者对引用的外部来源的情绪的第一步。我们的方法使用支持向量机(SVM)的文本分类技术,专门针对评论(CON)文章来识别主观引用。 CON,一个MEDLINE引文,表明了以前发表的文章评论由特定文章的作者表示可能是互补或矛盾的意见。我们基于选择性单词统计来介绍一袋Unigrams,它来自包含作者引文和词汇语言线索的描述的句子中的句子中的文本区域,以构建SVM分类器的输入特征向量。从414个不同在线生物医学期刊标题收集的一组CON句子的实验表明,与来自感兴趣的文本相比,SVM分级器产生了所提出的一袋Unigram的输入特征的比较结果。整句。此外,我们通过输入特征向量实现SVM的显着性能提升,其中组合了两种类型的Unigrams和情感词词典。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号