Content Information Extraction of Theme Web Pages Based on Tag Information

机译：基于标签信息的主题网页内容信息提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In order to extract the content information of Theme Web Pages more accurately, this paper proposes a self-learning method based on the tag information by calculating the information quantity of various tag indicators. This method predefines several tag information indexes and coefficients index to calculate a variety of tag information quantity of the web pages in turn, and then the candidate content of Web pages is in the tag with the most information quantity. To improve the versatility of the method, we add the adaptive and adjustable coefficient weight in calculation formulas of tag information quantity. With the increasing of data be processed, tag collections, index value and the information quantity results are added into the learning database to adjust the weight of coefficient factor. Experimental results show that the accuracy of this extraction method with adaptive and adjustable coefficient weights can reach more than 99 percent recall rate. Also, this method does not depend on the specific structure and style of the web page and has good versatility.

机译：为了更准确地提取主题网页的内容信息，本文提出了一种基于标签信息的自学习方法，通过计算各种标签指示符的信息量来实现。该方法预先定义几个标签信息索引和系数索引，依次计算出各种网页的标签信息量，然后网页的候选内容在信息量最大的标签中。为了提高方法的通用性，我们在标签信息量的计算公式中增加了自适应系数系数和可调系数权重。随着要处理的数据的增加，将标记集合，索引值和信息量结果添加到学习数据库中，以调整系数因子的权重。实验结果表明，该方法具有自适应的系数权重和可调的系数权重，其查全率可达99％以上。而且，该方法不依赖于网页的特定结构和样式，并且具有良好的通用性。

著录项

来源
《International Symposium on Computational Intelligence and Design》|2014年|501-504|共4页
会议地点
作者
Jie Wang; Jian Wu; Yafeng Zhang; Guowan He;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Content Information Extraction; DOM Tree; Tag information quantity; Theme Web pages;

机译：内容信息提取; DOM树;标签信息量;主题网页;

相似文献

外文文献
中文文献
专利

1. Content extraction from news web pages using tag tree [J] . Chandrakala Arya, Sanjay K. Dwivedi International Journal of Autonomic Computing . 2018,第1期

机译：使用标签树从新闻网页提取的内容提取
2. An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages [J] . S.Sathya, Dr. B.Srinivasan International Journal of Computer Trends and Technology . 2013,第9期

机译：一种从网页中进行标签提取和内容检索的有效技术
3. Relation Extraction from Web Contents with Linguistic and Web Features（言語分析およびWeb上の情報を用いたコンテンツからの関係の抽出） [J] . 顔玉蘭人工知能学会志 . 2011,第1期

机译：使用语言和Web功能从Web内容中提取关系（使用Web上的信息进行语言分析和从内容中提取关系）
4. Content Information Extraction of Theme Web Pages Based on Tag Information [C] . Jie Wang, Jian Wu, Yafeng Zhang, International Symposium on Computational Intelligence and Design . 2014

机译：基于标签信息的主题网页的内容信息提取
5. Using a named entity tagger and a syntactic parser to improve Web-based answer extraction [D] . Kamel, Yasser. 2004

机译：使用命名实体标记器和语法解析器来改进基于Web的答案提取
6. PLAN2L: a web tool for integrated text mining and literature-based bioentity relation extraction [O] . Martin Krallinger, Carlos Rodriguez-Penagos, Ashish Tendulkar, 2009

机译：PLAN2L：用于集成文本挖掘和基于文献的生物实体关系提取的Web工具
7. Automatic web content extraction for generating tag clouds from Thai web sites [O] . Thanadechteemapat W., Fung C.C. 2011

机译：自动提取Web内容以从泰国网站生成标签云

Content Information Extraction of Theme Web Pages Based on Tag Information

摘要

著录项

相似文献

相关主题

期刊订阅