首页> 美国卫生研究院文献>PLoS Computational Biology >A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts
【2h】

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

机译:对1500万篇全文文章中的文本挖掘与相应摘要进行全面定量的比较

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823–2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein–protein, disease–gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.
机译:在学术界和工业界,文本挖掘已成为一种与科学文献的快速增长保持一致的流行策略。由于可获得性,科学文献的文本挖掘主要在摘要的集合上进行。在这里,我们对1823年至2016年期间发表的1500万篇英语科学全文文章进行了分析。我们描述了近250年中文章长度和出版物子主题的发展。我们通过使用命名的实体识别系统提取已发布的蛋白质,蛋白质,疾病基因和蛋白质亚细胞相关联,展示了文本挖掘的潜力,并使用黄金标准基准数据集定量报告了其准确性。随后,我们将调查结果与MEDLINE中包含的1,650万个摘要的相应结果进行了比较,结果表明,全文文章的文本挖掘始终优于仅使用摘要的文本挖掘。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号