XPath based crawling method with crowdsourcing for targeted online market places

机译：基于XPath的具有众包的针对目标在线市场的爬网方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

An increasing number of online market places have emerged as online shopping becomes more popular for a couple of decades. During that time, technologies to construct web sites have been evolved as well and, currently, AJAX is a representative technique to construct dynamic web pages. Crawling is a basic tool to collect information in the internet, and traditional crawling techniques randomly choose and follow links represented by the anchor tag in order to navigate the Word-Wide-Web. However, when a traditional crawler is applied for gathering information from a targeted up-to-date online market place, there are some critical problems. The first issue is that there are too many links, among which only few are enough to navigate all web pages in the site. The second issue is that most links are given by JavaScript but not by the anchor tags, which cannot be followed by the traditional web crawlers. Therefore, to overcome these issues, we suggest a webpage crawling method which can extract only necessary and sufficient links by adopting crowdsourcing approach and can follow JavaScript links by using a navigating information represented by XPaths.

机译：随着在线购物在几十年中变得越来越流行，已经出现了越来越多的在线市场。在此期间，构建网站的技术也得到了发展，目前，AJAX是构建动态网页的代表技术。爬网是一种在Internet上收集信息的基本工具，而传统的爬网技术会随机选择并跟随由定位标记表示的链接，以浏览Word-Wide-Web。但是，当使用传统的搜寻器从目标最新的在线市场收集信息时，会遇到一些严重的问题。第一个问题是链接太多，其中只有很少的链接足以浏览站点中的所有网页。第二个问题是，大多数链接是由JavaScript提供的，而不是由锚标记提供的，传统的Web爬网程序无法跟随这些锚。因此，为了克服这些问题，我们建议一种网页爬网方法，该方法可以通过采用众包方法仅提取必要和足够的链接，并可以通过使用XPaths表示的导航信息来跟踪JavaScript链接。

著录项

来源
《International Conference on Big Data and Smart Computing》|2016年|395-397|共3页
会议地点
作者
Jae-ho Shin; Gyoung-Don Joo; Chulyun Kim;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
crawilng; crowdsourcing; xpath;

机译：cracrang; crowdsourcing; xpath;

相似文献

外文文献
中文文献
专利

1. Tree-based methods for online multi-target regression [J] . Osojnik Aljaz, Panov Pance, Dzeroski Saso Journal of Intelligent Information Systems . 2018,第2期

机译：基于树的在线多目标回归方法
2. Method based on mirror dual network architecture and online iterative model for the real-time detection and tracking of targets in streetscape videos [J] . Liu J. M., Yang M. H. Basic & clinical pharmacology & toxicology. . 2018,第Suppla3期

机译：基于镜像双网络架构和在线迭代模型的实时检测和跟踪街道视频目标的跟踪
3. Integrated Online System for a Pyrosequencing-Based Microbial Source Tracking Method that Targets Bacteroidetes 16S rDNA [J] . Tatsuya Unno, Doris Y. W. Di, Jeonghwan Jang, Environmental Science & Technology . 2012,第1期

机译：基于焦磷酸测序的微生物来源追踪方法的集成在线系统，该方法靶向拟杆菌16S rDNA
4. XPath based crawling method with crowdsourcing for targeted online market places [C] . Jae-ho Shin, Gyoung-Don Joo, Chulyun Kim International Conference on Big Data and Smart Computing . 2016

机译：基于XPath的抓住方法，具有针对目标在线市场的众包
5. Model-based Crawling - An Approach to Design Efficient Crawling Strategies for Rich Internet Applications. [D] . Dincturk, Mustafa Emre. 2013

机译：基于模型的爬网-一种为富Internet应用程序设计有效的爬网策略的方法。
6. Competing for space in an already crowded market: a mixed methods study of why an online community of practice (CoP) for alcohol harm reduction failed to generate interest amongst the group of public health professionals at which it was aimed [O] . Ruth Ponsford, Jennifer Ford, Helena Korjonen, 2017

机译：在一个人满为患的市场中竞争空间：关于减少酒精危害的在线实践社区（CoP）为何未能引起针对其的公共卫生专业人员群体的兴趣的混合方法研究
7. A crowdsourcing method for online social networks security assessment based on human-centric computing [O] . Zhiyong Zhang, Junchang Jing, Xiaoxue Wang, 2020

机译：基于人以人为本的计算的在线社交网络安全评估的众包方法

XPath based crawling method with crowdsourcing for targeted online market places

摘要

著录项

相似文献

相关主题

期刊订阅