首页> 美国卫生研究院文献>other >MSL: Facilitating automatic and physical analysis of published scientific literature in PDF format
【2h】

MSL: Facilitating automatic and physical analysis of published scientific literature in PDF format

机译:MSL:促进以PDF格式对已发表的科学文献进行自动和物理分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Published scientific literature contains millions of figures, including information about the results obtained from different scientific experiments e.g. PCR-ELISA data, microarray analysis, gel electrophoresis, mass spectrometry data, DNA/RNA sequencing, diagnostic imaging (CT/MRI and ultrasound scans), and medicinal imaging like electroencephalography (EEG), magnetoencephalography (MEG), echocardiography  (ECG), positron-emission tomography (PET) images. The importance of biomedical figures has been widely recognized in scientific and medicine communities, as they play a vital role in providing major original data, experimental and computational results in concise form. One major challenge for implementing a system for scientific literature analysis is extracting and analyzing text and figures from published PDF files by physical and logical document analysis. Here we present a product line architecture based bioinformatics tool ‘Mining Scientific Literature (MSL)’, which supports the extraction of text and images by interpreting all kinds of published PDF files using advanced data mining and image processing techniques. It provides modules for the marginalization of extracted text based on different coordinates and keywords, visualization of extracted figures and extraction of embedded text from all kinds of biological and biomedical figures using applied Optimal Character Recognition (OCR). Moreover, for further analysis and usage, it generates the system’s output in different formats including text, PDF, XML and images files. Hence, MSL is an easy to install and use analysis tool to interpret published scientific literature in PDF format.
机译:已出版的科学文献包含数百万个图,包括有关从不同科学实验中获得的结果的信息,例如PCR-ELISA数据,微阵列分析,凝胶电泳,质谱数据,DNA / RNA测序,诊断成像(CT / MRI和超声扫描)以及医学成像,如脑电图(EEG),脑磁图(MEG),超声心动图(ECG),正电子发射断层扫描(PET)图像。生物医学数字的重要性已在科学和医学界得到广泛认可,因为它们在以简洁的形式提供主要原始数据,实验和计算结果方面起着至关重要的作用。实施科学文献分析系统的一个主要挑战是通过物理和逻辑文件分析从已发布的PDF文件中提取和分析文本和图形。在这里,我们介绍一种基于产品线架构的生物信息学工具“矿业科学文献(MSL)”,该工具通过使用先进的数据挖掘和图像处理技术来解释各种已发布的PDF文件,从而支持文本和图像的提取。它提供了用于基于不同坐标和关键字的提取文本的边缘化,提取的图形的可视化以及使用应用的最佳字符识别(OCR)从各种生物和生物医学图形中提取嵌入文本的模块。此外,为了进一步分析和使用,它以不同的格式(包括文本,PDF,XML和图像文件)生成系统的输出。因此,MSL是易于安装和使用的分析工具,可以以PDF格式解释已发表的科学文献。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号