Genetic mining of HTML structures for effective Web-document retrieval

Kim S.; Zhang BT.

首页> 外文期刊>Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies >Genetic mining of HTML structures for effective Web-document retrieval

【24h】

Genetic mining of HTML structures for effective Web-document retrieval

机译：HTML结构的遗传挖掘以有效地检索Web文档

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web-documents have a number of tags indicating the structure of texts. Text segments marked by HTML tags have specific meaning which can be utilized to improve the performance of document retrieval systems. In this paper, we present a machine learning approach to mine the structure of HTML documents for effective Web-document retrieval. A genetic algorithm is described that learns the importance factors of HTML tags which are used to re-rank the documents retrieved by standard weighting schemes. The proposed method has been evaluated on artificial text sets and a large-scale TREC document collection. Experimental evidence supports that the tag weights are well trained by the proposed algorithm in accordance with the importance factors for retrieval, and indicates that the proposed approach significantly improves the performance in retrieval accuracy. In particular, the use of the document-structure mining approach tends to move relevant documents to upper ranks, which is especially important in interactive Web-information retrieval environments. [References: 44]

机译：Web文档具有许多指示文本结构的标签。用HTML标签标记的文本段具有特定含义，可以用来提高文档检索系统的性能。在本文中，我们提出了一种机器学习方法来挖掘HTML文档的结构，以进行有效的Web文档检索。描述了一种遗传算法，该算法学习HTML标记的重要性因素，这些标记用于对通过标准加权方案检索的文档进行重新排名。该方法已经在人工文本集和大规模的TREC文档集中进行了评估。实验证据表明，根据检索的重要因素，所提出的算法可以很好地训练标签的权重，并表明所提出的方法显着提高了检索准确性。特别是，使用文档结构挖掘方法倾向于将相关文档移到较高级别，这在交互式Web信息检索环境中尤其重要。 [参考：44]

著录项

来源
《Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies 》 |2003年第3期| 共14页
作者
Kim S.; Zhang BT.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术 ;
关键词
Genetic algorithms; Machine learning; Web-documents; Information retrieval; World-wide-web; Algorithms;

机译：遗传算法;机器学习;Web文档;信息检索;万维网;算法;

相似文献

外文文献
中文文献
专利

1. Genetic mining of HTML structures for effective Web-document retrieval [J] . Kim S., Zhang BT. Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2003 ,第3期

机译：HTML结构的遗传挖掘以有效地检索Web文档
2. Mining literature for a comprehensive pathway analysis: A case study for retrieval of homocysteine related genes for genetic and epigenetic studies [J] . Priyanka Sharma, RD Senthilkumar, Vani Brahmachari, Lipids in Health Disease . 2006 ,第1期

机译：全面路径分析的挖掘文献：为基因和表观遗传学研究检索同型半胱氨酸相关基因的案例研究
3. The $ mathcal{N} = 4 $ effective action of type IIA supergravity compactified on SU(2)-structure manifolds [J] . Thomas Danckaert, Jan Louis, Danny Martínez-Pedrera, The journal of high energy physics . 2011 ,第8期

机译： $ mathcal {n} = 4 $ 在SU（2）结构歧管上压缩IIA型超级级的有效动作
4. Efficient Annealing -Inspired Genetic Algorithm for Information Retrieval from Web-Document [C] . Yuan Xu, Yang Deli, Liu Yu World summit on genetic and evolutionary computation;2009 GEC Summit . 2009

机译：基于有效退火的遗传算法从Web文档中检索信息
5. Improving Web retrieval by mining the HTML tags for keywords and exploring the hyperlink structures of Web pages. [D] . Quevedo-Torrero, Jesus Ubaldo. 2004

机译：通过挖掘HTML标记的关键字并探索网页的超链接结构来改善Web检索。
6. Mining literature for a comprehensive pathway analysis: A case study for retrieval of homocysteine related genes for genetic and epigenetic studies [O] . Priyanka Sharma, RD Senthilkumar, Vani Brahmachari, 2006

机译：进行全面路径分析的采矿文献：为基因和表观遗传学研究检索同型半胱氨酸相关基因的案例研究
7. Genetic Mining of DNA Sequence Structures for Effective Classification of the Risk Types of Human Papillomavirus (HPV [O] . Of Human Papillomavirus (hpv, Jae-hong Eom, Seong-bae Park, 2004

机译：DNa序列结构的遗传挖掘有效分类人乳头瘤病毒（HpV

Genetic mining of HTML structures for effective Web-document retrieval

摘要

著录项

相似文献

相关主题

期刊订阅