首页> 外文期刊>Journal of Advances in Information Technology >An Efficient Keyword Based Search of Big Data Using Map Reduce
【24h】

An Efficient Keyword Based Search of Big Data Using Map Reduce

机译:基于基于关键字的基于关键字使用地图减少的大数据搜索

获取原文
获取原文并翻译 | 示例
           

摘要

With the arrival of the data deluge, traditional and centralized tools used to extract knowledge from data become obsolete due to their limited ability to handle massive data. To cope with the need for scalable solutions, a new framework has emerged: Hadoop, an open-source ecosystem designed for storage and large-scale processing work on a cluster of commodity hardware. In order to overcome the limitations in key word based information retrieval systems, an efficient methodology has been designed. A system with the new approach mimics the real world, where every task is laced with certain indexing as this is basic idea behind knowledge processing. Hadoop and R: open source frame works for storing and processing large datasets, are used for preprocessing the text documents. First, a set of text documents are considered. Preprocessing is performed on a large domain of data using R. This includes the removal of the stop words along with stemming and excluding less frequency words. Despite this preprocessing, owing to the colossal number of index terms still floating in the considered domain data, the problem of high dimensionality is encountered. Therefore the dimensionality of such a group of terms is reduced by incorporating a keyword based methodology in Hadoop MapReduce Framework. The developed Model is useful for processing the query which gives us the relevant information with low response time from the data pool considered.
机译:随着数据策划的到来,由于其处理大规模数据的能力有限,用于从数据中提取来自数据的知识的传统和集中式工具已经过时。为了应对可扩展解决方案的需求,出现了一个新的框架:Hadoop,这是一个用于存储和大规模加工在商品硬件集群上的开源生态系统。为了克服基于关键词的信息检索系统中的限制,设计了有效的方法。一个具有新方法的系统模仿现实世界,每个任务都会使用某些索引,因为这是知识处理背后的基本思想。 Hadoop和R:开源帧用于存储和处理大型数据集的工作,用于预处理文本文档。首先,考虑一组文本文档。在使用R的大型数据域上执行预处理。这包括去除停止单词以及诸如诸多频率的单词。尽管这种预处理,但由于仍然浮现在所考虑的域数据中的指数术语的巨大数量,遇到了高维度的问题。因此,通过在Hadoop MakReduce框架中包含基于关键字的方法,减少了这类术语的维度的维度。开发的模型对于处理查询是有用的,这向我们提供了从所考虑的数据池中具有低响应时间的相关信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号