A Study of Content Extraction From Web Pages Based on Links

R.Gunasundari; S.Karthikeyan

首页> 外文期刊>International Journal of Data Mining & Knowledge Management Process >A Study of Content Extraction From Web Pages Based on Links

【24h】

A Study of Content Extraction From Web Pages Based on Links

机译：基于链接的网页内容提取研究

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Extracting main content from web page is the preprocessing of web information system. The content extraction approach based on wrapper is limited to one specific information source, and greatly depends on web page structure. It is seldom employed in practice. A new content extraction method is thus proposed in this paper, which can discover web page content according to the number of punctuations and the ratio of non-hyperlink character number to character number that hyperlinks contain. It can eliminate noise and extract main content blocks from web page effectively. Experimental results show that this approach is accurate and suitable for most web sites.

机译：从网页中提取主要内容是网络信息系统的预处理。基于包装的内容提取方法仅限于一种特定的信息源，并且在很大程度上取决于网页结构。在实践中很少采用。因此，本文提出了一种新的内容提取方法，该方法可以根据标点符号的数量和非超链接字符数与超链接包含的字符数的比值来发现网页内容。它可以消除噪音并有效地从网页中提取主要内容块。实验结果表明，该方法是准确的，适用于大多数网站。

著录项

来源
《International Journal of Data Mining & Knowledge Management Process》 |2012年第3期|共页
作者
R.Gunasundari; S.Karthikeyan;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Extraction of Context Information from Web Content Using Entity Linking [J] . Norifumi Hirata, Shun Shiramatsu, Tadachika Ozono, International journal of computer science and network security . 2013,第2期

机译：使用实体链接从Web内容中提取上下文信息
2. Extraction of Context Information from Web Content Using Entity Linking [J] . Norifumi Hirata, Shun Shiramatsu, Tadachika Ozono, International journal of computer science and network security . 2013,第2期

机译：使用实体链接从Web内容中提取上下文信息
3. Dynamic Generation of Links and Forwarding to Related Web-based Content [J] . Fortschritt-Berichte VDI . 2019,第864期

机译：动态生成链接并转发到相关的基于Web的内容
4. Web Page Content Extraction Method Based on Link Density and Statistic [C] . Donghua Pan, Shaogang Qiu, Dawei Yin The 4th International Conference on Wireless Communications, Networking and Mobile Computing（第四届IEEE无线通信、网络技术及移动计算国际会议）论文集 . 2008

机译：基于链接密度和统计量的网页内容提取方法
5. Web based content and hybrid teaching: Student perceptions of the effectiveness of using web based content and hyper-linked teaching units in teaching hybrid business and marketing post secondary classes. [D] . Richardson, W. Tim G. 2007

机译：基于Web的内容和混合教学：学生对使用基于Web的内容和超链接教学单元在混合商务和市场营销中学后课程教学中的有效性的看法。
6. WebSAT: A Web-based Competency Self-Assessment System Linking to Educational Resources [O] . H.D. Covvey, S. Fenton, D. Mulholland, 2006

机译：WebSAT：基于网络的能力自我评估系统可链接到教育资源
7. Improving Webpage Content Extraction by extending a novel single page extraction approach: A case study with Thai websites [O] . Thanadechteemapat W., Fung C.C. 2012

机译：通过扩展新颖的单页提取方法来改善网页内容提取：以泰国网站为例

A Study of Content Extraction From Web Pages Based on Links

摘要

著录项

相似文献

相关主题

期刊订阅