Web Data Extraction from Scientific Publishers’ Website Using Heuristic Algorithm

Umamageswari Kumaresan; Kalpana Ramanujam

首页> 外文期刊>International Journal of Intelligent Systems and Applications >Web Data Extraction from Scientific Publishers’ Website Using Heuristic Algorithm

【24h】

Web Data Extraction from Scientific Publishers’ Website Using Heuristic Algorithm

机译：使用启发式算法从科学出版商的网站中提取Web数据

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

WWW is a huge repository of information and the amount of information available on the web is growing day by day in an exponential manner. End users make use of search engines like Google, Yahoo, and Bingo etc. for retrieving information. Search engines use web crawlers or spiders which crawl through a sequence of web pages in order to locate the relevant pages and provide a set of links ordered by relevancy. Those indexed web pages are part of surface web. Getting data from deep web requires form submission and is not performed by search engines. Data analytics and data mining applications depend on data from deep web pages and automatic extraction of data from deep web is cumbersome due to diverse structure of web pages. In the proposed work, a heuristic algorithm for automatic navigation and information extraction from journal’s home page has been devised. The algorithm is applied to many publishers website such as Nature, Elsevier, BMJ, Wiley etc. and the experimental results show that the heuristic technique provides promising results with respect to precision and recall values.

机译：WWW是一个巨大的信息资源库，Web上可用的信息量正以指数方式增长。最终用户利用Google，Yahoo和Bingo等搜索引擎来检索信息。搜索引擎使用网络爬虫或蜘蛛来搜寻一系列网页，以便找到相关页面并提供按相关性排序的一组链接。这些索引网页是表面网页的一部分。从深层网络获取数据需要提交表单，而不是由搜索引擎执行。数据分析和数据挖掘应用程序依赖于深层网页中的数据，由于网页结构的多样性，从深层网页中自动提取数据非常麻烦。在拟议的工作中，已经设计了一种启发式算法，用于从期刊主页自动导航和提取信息。该算法已应用于许多出版商的网站，如Nature，Elsevier，BMJ，Wiley等，实验结果表明，启发式技术在准确性和查全率方面提供了有希望的结果。

著录项

来源
《International Journal of Intelligent Systems and Applications 》 |2017年第10期| 共9页
作者
Umamageswari Kumaresan; Kalpana Ramanujam;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论 ;
关键词

相似文献

外文文献
中文文献
专利

1. Algorithms of mining data records from website automatically [J] . Qiu Yong, Lan Yongjie Journal of Southeast University . 2006 ,第3期

机译：自动从网站上挖掘数据记录的算法
2. An Efficient Mechanism for Product Data Extraction from E-Commerce Websites [J] . Malik Javed Akhtar, Zahur Ahmad, Rashid Amin, Computers, Materials & Continua . 2020 ,第3期

机译：电子商务网站产品数据提取的有效机制
3. Automatic Data Extraction from Websites for Generating Aquatic Product Market Information [J] . YUAN Hong-chun, CHEN Ying, SUN Yue-fu Journal of Dong Hua University . 2006 ,第6期

机译：从网站自动提取数据以生成水产品市场信息
4. Web Data Extraction from Scientific Publishers' Website Using Hidden Markov Model [C] . Jing Huang, Ziyu Liu, Beibei Wang, International conference on knowledge science, engineering and management . 2018

机译：使用隐马尔可夫模型从科学出版商的网站中提取Web数据
5. African American news websites: Publishers' views, perspectives and experiences in relation to the social construction of news, online news and the Black press [D] . Akil, Bakari, II 2007

机译：非裔美国人新闻网站：与新闻，在线新闻和Black Press的社会建设有关的发行商观点，观点和经验
6. Web-based Education for Low-literate Parents in Neonatal Intensive Care Unit: Development of a Website and Heuristic Evaluation and Usability Testing [O] . Jeungok Choi, Suzanne Bakken -1

机译：新生儿重症监护病房低识相素父母的基于网络教育：一种网站和启发式评估和可用性测试的发展
7. Transforming user data into user value by novel mining techniques for extraction of web content, structure and usage patterns. The Development and Evaluation of New Web Mining Methods that enhance Information Retrieval and improve the Understanding of User¿s Web Behavior in Websites and Social Blogs. [O] . Ammari Ahmad N. 2010

机译：通过新颖的挖掘技术将用户数据转化为用户价值，以提取Web内容，结构和使用模式。新的Web挖掘方法的开发和评估，该方法可增强信息检索和增进对网站和社交博客中用户Web行为的理解。

Web Data Extraction from Scientific Publishers’ Website Using Heuristic Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅