首页> 外文会议>International Conference on Data Science and Its Applications >Extraction of Website Navigation Label Using A Multiple Web Crawler. A Case Study on 14 University Websites in Indonesia
【24h】

Extraction of Website Navigation Label Using A Multiple Web Crawler. A Case Study on 14 University Websites in Indonesia

机译:使用多个Web爬网程序提取网站导航标签。印度尼西亚14个大学网站的案例研究

获取原文

摘要

Labeling system is designed for a website to find information easily. Labeling system in a website is required to represent the information contents. One of methods to design labeling system is to compare a website with its competitors’ websites. The benefit of comparing labels is to get common labels therefore it will make users to easily find and use the labeling system. Labels are extracted using a web crawler. To make web crawler, it must consider the structure of targeted website. The problems arise when there are several different target websites that will be compared. That means, it is necessary to create some unique and different web crawler program codes, so it takes a long time. This research proposes and analyzes multiple web crawler. The step to make multiple web crawler is, first, start to collect targeted website based on 14 top Indonesia’s University (based on Indonesia Higher Education’s University Ranking). Next step is analyzing the pattern of structure navigation labels. Then the results are used to make a multiple web crawler. The result of this research is a multiple web crawler that can extract navigation label of several different target websites automatically without writing another program code of different web crawler for each crawled website.
机译:标签系统设计用于网站轻松查找信息。网站上的标签系统需要代表信息内容。设计标签系统的一种方法是将网站与竞争对手的网站进行比较。比较标签的好处是可以获取通用标签,因此可以使用户轻松找到并使用标签系统。使用网络搜寻器提取标签。要制作网络爬虫,必须考虑目标网站的结构。当有几个不同的目标网站要进行比较时,就会出现问题。这意味着,有必要创建一些独特且不同的Web爬网程序代码,因此需要很长时间。这项研究提出并分析了多个Web爬虫。制作多个网络爬虫的步骤是,首先开始根据印度尼西亚14所顶尖大学(根据印度尼西亚高等教育的大学排名)收集目标网站。下一步是分析结构导航标签的模式。然后将结果用于制作多个Web搜寻器。这项研究的结果是一个多Web爬虫,它可以自动提取多个不同目标网站的导航标签,而无需为每个爬网的网站编写不同Web爬虫的另一个程序代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号