Classification of scientific papers with big data technologies

机译：大数据技术对科学论文的分类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data sizes that cannot be processed by conventional data storage and analysis systems are named as Big Data. It also refers to new technologies developed to store, process and analyze large amounts of data. Automatic information retrieval about the contents of a large number of documents produced by different sources, identifying research fields and topics, extraction of the document abstracts, or discovering patterns are some of the topics that have been studied in the field of big data. In this study, the Naïve Bayes classification algorithm, which is run on a data set consisting of scientific articles, has been tried to automatically determine the classes to which these documents belong. We have developed an efficient system that can analyze the Turkish scientific documents with the distributed document classification algorithm run on the Cloud Computing infrastructure. The Apache Mahout library is used in the study. The servers required for classifying and clustering distributed documents are.

机译：常规数据存储和分析系统无法处理的数据大小称为大数据。它还指开发用于存储，处理和分析大量数据的新技术。有关由不同来源生成的大量文档的内容的自动信息检索，确定研究领域和主题，提取文档摘要或发现模式是大数据领域中已研究的一些主题。在这项研究中，尝试对包含科学文章的数据集运行的NaïveBayes分类算法自动确定这些文档所属的类别。我们开发了一种高效的系统，该系统可以使用在云计算基础架构上运行的分布式文档分类算法来分析土耳其的科学文档。研究中使用了Apache Mahout库。是对分布式文档进行分类和聚类所需的服务器。

著录项

来源
《》|2017年|697-701|共5页
会议地点 Antalya(TR)
作者
Selen Gurbuz; Galip Aydin;
展开▼
作者单位

Firat University, Computer Engineering Department, 23100, Elazig, Turkey;

Firat University, Computer Engineering Department, 23100, Elazig, Turkey;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Google; File systems; Java; Yarn; Reliability engineering; Data processing;

机译：谷歌;文件系统; Java;纱线;可靠性工程;数据处理;;

相似文献

外文文献
中文文献
专利

1. Data Publishing and Scientific Journals: The Future of the Scientific Paper in a World of Shared Data [J] . Erik De Schutter Neuroinformatics . 2010,第3期

机译：数据出版和科学期刊：共享数据世界中科学论文的未来
2. Data Publishing and Scientific Journals: The Future of the Scientific Paper in a World of Shared Data [J] . Erik De Schutter Neuroinformatics . 2010,第3期

机译：数据出版和科学期刊：共享数据世界中科学论文的未来
3. A promising combination of approaches for solving complex text classification tasks: application to the classification of scientific papers into patents classes [J] . Kafil Hajlaoui, Jean-Charles Lamirel, Pascal Cuxac International journal of knowledge and learning . 2014,第1a2期

机译：解决复杂的文本分类任务的方法的有希望的组合：将科学论文分类为专利类别的应用
4. Classification of scientific papers with big data technologies [C] . Selen Gurbuz, Galip Aydin International Conference on Computer Science and Engineering . 2017

机译：具有大数据技术的科学论文分类
5. Scientific visualization and data mining for massive scientific datasets. [D] . Sharma, Ashish. 2005

机译：科学可视化和大量科学数据集的数据挖掘。
6. Milk microfiltration process dataset annotated from a collection of scientific papers [O] . Patrice Buche, Stéphane Dervaux, Nadine Leconte, 2021

机译：牛奶微滤过程数据集从一篇科学论文集合注释
7. Classification of scientific papers with big data technologies [O] . Selen Gurbuz, Galip Aydin 2017

机译：具有大数据技术的科学论文分类
8. A classification and evaluation of data movement technologies for the delivery of highly voluminous scientific data products [R] . Mattmann, Chris A., Kelly, Sean, Crichton, Daniel J., 2006

机译：分析和评估用于提供大量科学数据产品的数据移动技术

Classification of scientific papers with big data technologies

摘要

著录项

相似文献

相关主题

期刊订阅