首页> 外文会议> >Classification of scientific papers with big data technologies
【24h】

Classification of scientific papers with big data technologies

机译:大数据技术对科学论文的分类

获取原文
获取原文并翻译 | 示例

摘要

Data sizes that cannot be processed by conventional data storage and analysis systems are named as Big Data. It also refers to new technologies developed to store, process and analyze large amounts of data. Automatic information retrieval about the contents of a large number of documents produced by different sources, identifying research fields and topics, extraction of the document abstracts, or discovering patterns are some of the topics that have been studied in the field of big data. In this study, the Naïve Bayes classification algorithm, which is run on a data set consisting of scientific articles, has been tried to automatically determine the classes to which these documents belong. We have developed an efficient system that can analyze the Turkish scientific documents with the distributed document classification algorithm run on the Cloud Computing infrastructure. The Apache Mahout library is used in the study. The servers required for classifying and clustering distributed documents are.
机译:常规数据存储和分析系统无法处理的数据大小称为大数据。它还指开发用于存储,处理和分析大量数据的新技术。有关由不同来源生成的大量文档的内容的自动信息检索,确定研究领域和主题,提取文档摘要或发现模式是大数据领域中已研究的一些主题。在这项研究中,尝试对包含科学文章的数据集运行的NaïveBayes分类算法自动确定这些文档所属的类别。我们开发了一种高效的系统,该系统可以使用在云计算基础架构上运行的分布式文档分类算法来分析土耳其的科学文档。研究中使用了Apache Mahout库。是对分布式文档进行分类和聚类所需的服务器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号