首页> 外文会议>International Conference on Computing for Sustainable Global Development >Construction of gazetteers from geo big data using machine learning technique on Hadoop
【24h】

Construction of gazetteers from geo big data using machine learning technique on Hadoop

机译:在Hadoop上使用机器学习技术从地理大数据构建地名词典

获取原文

摘要

Most gazetteers have been built and maintained for the purpose of visualizing geographical location on the Geographical Information system (GIS) client. The advent of big data allows us to construct gazetteers by directly mining rich volunteered information from the web. In this, we propose a technique for extracting location based spatial information from the web documents and media services like flickr, twitter, facebook for construction of gazetteers. To achieve this, we need to search the web for existing data pertaining location. A web crawler (Google search engine) generates the web pages based on the location keyword given by the user and maintaining the index of the web pages and the proposed system passes it to the Hadoop environment. For further simplification, the name node transfers the index group of web pages to different data nodes for extraction of spatial information from the dynamic web documents that we gather using machine learning process. Each data node is then utilized for the generation of a common template. The common template allows the extraction of location based spatial information from the dynamic web documents and media services. Resultant information from the data node is further merged using map reduce algorithms and the Hadoop Distributed File System (HDFS) is produced which is then converted to Geo-Java Script Object Notation (JSON) format, thus aiding in the task of visualizing the extracted information on the GIS client.
机译:大多数地名词典的建立和维护都是为了在地理信息系统(GIS)客户端上可视化地理位置。大数据的出现使我们能够通过直接从网络上挖掘丰富的自愿信息来构建地名索引。在此,我们提出了一种从Web文档和媒体服务(如flickr,twitter,facebook)中提取基于位置的空间信息的技术,以用于构建地名词典。为此,我们需要在网络上搜索与位置相关的现有数据。 Web搜寻器(Google搜索引擎)根据用户给出的location关键字生成网页,并维护网页的索引,然后所提出的系统将其传递给Hadoop环境。为了进一步简化,名称节点将网页的索引组传输到不同的数据节点,以从我们使用机器学习过程收集的动态Web文档中提取空间信息。然后将每个数据节点用于生成公共模板。通用模板允许从动态Web文档和媒体服务中提取基于位置的空间信息。来自数据节点的结果信息进一步使用map reduce算法合并,并生成Hadoop分布式文件系统(HDFS),然后将其转换为Geo-Java Script Object Notation(JSON)格式,从而有助于可视化提取的信息在GIS客户端上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号