Algorithms of mining data records from website automatically

Qiu Yong; Lan Yongjie

首页> 外文期刊>Journal of Southeast University >Algorithms of mining data records from website automatically

【24h】

Algorithms of mining data records from website automatically

机译：自动从网站上挖掘数据记录的算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In order to improve the accuracy and integrality of mining data records from the web, the concepts of isomorphic page and directory page and three algorithms are proposed. An isomorphic web page is a set of web pages that have uniform structure, only differing in main information. A web page which contains many links that link to isomorphic web pages is called a directory page. Algorithm 1 can find directory web pages in a web using adjacent links similar analysis method. It first sorts the link, and then counts the links in each directory. If the count is greater than a given valve then finds the similar sub-page links in the directory and gives the results. A function for an isomorphic web page judgment is also proposed. Algorithm 2 can mine data records from an isomorphic page using a noise information filter. It is based on the fact that the noise information is the same in two isomorphic pages, only the main information is different. Algorithm 3 can mine data records from an entire website using the technology of spider. The experiment shows that the proposed algorithms can mine data records more intactly than the existing algorithms. Mining data records from isomorphic pages is an efficient method.

机译：为了提高从网络上挖掘数据记录的准确性和完整性，提出了同构页面和目录页面的概念以及三种算法。同构网页是一组具有统一结构的网页，只是主要信息不同。包含许多链接到同构网页的链接的网页称为目录页面。算法1可以使用类似分析方法的相邻链接在网络中查找目录网页。它首先对链接进行排序，然后计算每个目录中的链接。如果计数大于给定阀门，则在目录中找到相似的子页面链接并给出结果。还提出了同构网页判断功能。算法2可以使用噪声信息过滤器从同构页面中挖掘数据记录。基于这样的事实，即噪声信息在两个同构页面中相同，只是主要信息不同。算法3可以使用Spider技术从整个网站上挖掘数据记录。实验表明，与现有算法相比，所提算法可以更完整地挖掘数据记录。从同构页面中挖掘数据记录是一种有效的方法。

著录项

来源
《Journal of Southeast University》 |2006年第3期|p.423-425|共3页
作者
Qiu Yong; Lan Yongjie;
展开▼
作者单位

School of Information and Electronic Engineering, Shandong Institute of Business and Technology, Yantai 264005, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类一般工业技术;
关键词
data mining; data record; website; isomorphic page;

机译：数据挖掘;数据记录;网站;同构页面;
入库时间 2022-08-17 23:57:04

相似文献

外文文献
中文文献
专利

1. Randomized algorithms in automatic control and data mining [J] . Gulustan Dogan Computing reviews . 2019,第10期

机译：自动控制和数据挖掘中的随机算法
2. Randomized algorithms in automatic control and data mining [J] . Harekrishna Misra Computing reviews . 2015,第7期

机译：自动控制和数据挖掘中的随机算法
3. Automatic selection of classification learning algorithms for data mining practitioners [J] . Jun Won Lee, Christophe Giraud-Carrier Intelligent data analysis . 2013,第4期

机译：为数据挖掘从业者自动选择分类学习算法
4. Algorithms of mining data records from website automatically [C] . Qiu Yong, Lan Yongjie Annual Workshop on Semantic Web and Ontology(SWON2006); 200611; Nanjing(CN) . 2006

机译：自动从网站上挖掘数据记录的算法
5. Toward better website usage: Leveraging data mining techniques and rough set learning to construct better-to-use websites. [D] . Khasawneh, Natheer Yousef. 2005

机译：更好地使用网站：利用数据挖掘技术和粗糙集学习来构建使用更好的网站。
6. Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches [O] . Faranak Kazerouni, Azadeh Bayani, Farkhondeh Asadi, 2020

机译：基于长非编码RNA表达的数据挖掘算法类型2糖尿病预测：四种数据采矿方法的比较
7. Phishing website detection using intelligent data mining techniques. Design and development of an intelligent association classification mining fuzzy based scheme for phishing website detection with an emphasis on E-banking. [O] . Abur-rous Maher Ragheb Mohammed 2010

机译：使用智能数据挖掘技术的网络钓鱼网站检测。一种基于智能关联分类挖掘模糊的网络钓鱼网站检测方案的设计与开发，重点是电子银行。

Algorithms of mining data records from website automatically

摘要

著录项

相似文献

相关主题

期刊订阅