首页> 外文会议>International conference on intelligent data engineering and automated learning >Convolutional Neural Network for Core Sections Identification in Scientific Research Publications
【24h】

Convolutional Neural Network for Core Sections Identification in Scientific Research Publications

机译:卷积神经网络用于科研出版物核心部分的识别

获取原文

摘要

The overwhelming volume of data generated online continuous to grow at an exponential and unprecedented rate. Over 80% of such data is unstructured. Scientific research publications constitute a significant portion of such unstructured data. Systematic literature review (SLR) activity is a rigorous and challenging process. The key challenge in SLR is the automatic extraction of the relevant data from the sheer volume of research publications. Lack of a unified framework has been identified as the key problem. A canonical model, based on the structure of the papers was proposed as the framework for data extraction purposes in SLR. Implemented as a classification problem, traditional machine learning models were used to realise the canonical model. A good accuracy was reported in these traditional models. However, there is room for improvement. This paper presents the result of the work on the same problem using convolutional neural network (CNN), which is more sophisticated (deeper). The results show an improvement over the traditional machine learning models with an accuracy of 85%. Unlike the previous CNN NLP works, this work also demonstrates the application of CNN on a bigger NLP dataset such as the data from the scientific research publications. The result also shows that the CNN performs even better in NLP tasks with bigger datasets.
机译:在线生成的压倒性数据量正以指数级和空前的速度持续增长。超过80%的此类数据是非结构化的。科学研究出版物构成了此类非结构化数据的重要部分。系统的文献综述(SLR)活动是一个严格而具有挑战性的过程。 SLR的主要挑战是从庞大的研究出版物中自动提取相关数据。缺乏统一框架已被确定为关键问题。提出了一种基于论文结构的典范模型作为SLR中数据提取目的的框架。作为分类问题,使用传统的机器学习模型来实现规范模型。在这些传统模型中报告了良好的准确性。但是,仍有改进的空间。本文介绍了使用更复杂(更深入)的卷积神经网络(CNN)对同一问题进行研究的结果。结果表明,与传统的机器学习模型相比,其准确度达到了85%。与以前的CNN NLP工作不同,该工作还演示了CNN在更大的NLP数据集(例如来自科研出版物的数据)中的应用。结果还表明,CNN在具有更大数据集的NLP任务中表现更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号