首页> 外文会议>IEEE International Conference on Semantic Computing >Analysis of Big Data Technologies and Method - Query Large Web Public RDF Datasets on Amazon Cloud Using Hadoop and Open Source Parsers
【24h】

Analysis of Big Data Technologies and Method - Query Large Web Public RDF Datasets on Amazon Cloud Using Hadoop and Open Source Parsers

机译:大数据技术和方法分析-使用Hadoop和开源解析器在Amazon Cloud上查询大型Web公共RDF数据集

获取原文

摘要

Extremely large datasets found in Big Data projects are difficult to work with using conventional databases, statistical software, and visualization tools. Massively parallel software, such as Hadoop, running on tens, hundreds, or even thousands of servers is more suitable for Big Data challenges. Additionally, in order to achieve the highest performance when querying large datasets, it is necessary to work these datasets at rest without preprocessing or moving them into a repository. Therefore, this work will analyze tools and techniques to overcome working with large datasets at rest. Parsing and querying will be done on the raw dataset - the untouched Web Data Commons RDF files. Web Data Commons comprises five billion pages of web pages crawled from the Internet. This work will analyze available tools and appropriate methods to assist the Big Data developer in working with these extremely large, semantic RDF datasets. Hadoop, open source parsers, and Amazon Cloud services will be used to data mine these files. In order to assist in further discovery, recommendations for future research will be included.
机译:使用常规数据库,统计软件和可视化工具很难在大数据项目中找到非常大的数据集。在数十台,数百台甚至数千台服务器上运行的大规模并行软件(例如Hadoop)更适合应对大数据挑战。另外,为了在查询大型数据集时获得最高性能,有必要在不进行预处理或将其移至存储库的情况下静止处理这些数据集。因此,这项工作将分析工具和技术,以克服静态处理大型数据集的问题。解析和查询将在原始数据集-未修改的Web Data Commons RDF文件上进行。 Web数据共享区包含从Internet爬网的五十亿个网页。这项工作将分析可用的工具和适当的方法,以协助大数据开发人员处理这些非常大的语义RDF数据集。 Hadoop,开源解析器和Amazon Cloud服务将用于对这些文件进行数据挖掘。为了帮助进一步发现,将包括对未来研究的建议。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号