首页> 外文会议>International Conference on Advanced Design and Manufacturing Engineering >Extraction Research about Parallelization of Named Entity Based on Hadoop Platform
【24h】

Extraction Research about Parallelization of Named Entity Based on Hadoop Platform

机译:基于Hadoop平台的命名实体并行化的提取研究

获取原文

摘要

With the era of big data approaching, data becomes more and more important. Faced with such massive amounts of data space, how to quickly identify the contents of a field that the users are interest in and extract them out, is an urgent problem to be solved. To identify the content that users are interested in, we can use NLPIR Chinese word segmentation framework for speech segmentation, and identify named entity according to part of speech tagging. For extraction, using Hadoop, parallel cluster platform based on a big data MapReduce framework, using the Hadoop Distributed File System (HDFS) for efficient data access and starting Map and Reduce tasks to extract the information of named entity. This task extracts the required information from the interactive encyclopedia and then stores them in the knowledge base. It implements the task of extracting the information data of parallelization of named entity based on Hadoop platform.
机译:随着大数据的时代,数据变得越来越重要。面对如此大量的数据空间,如何快速识别用户对用户感兴趣并提取它们的领域的内容,是要解决的迫切问题。要识别用户对用户感兴趣的内容,我们可以使用NLPIR中文字分段框架进行语音分割,并根据语音标记的一部分标识命名实体。对于提取,使用Hadoop,并行群集平台基于大数据MapReduce框架,使用Hadoop分布式文件系统(HDFS)进行高效的数据访问和起始地图并减少提取命名实体信息的任务。此任务从交互式百科全书中提取所需信息,然后将其存储在知识库中。它实现了基于Hadoop平台提取命名实体并行化信息数据的任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号