A Semantic Based Approach for Information Retrieval from Html Documents Using Wrapper Induction Technique

A.M.Abirami; A.Askarunisa; T.M.Aishwarya; K.S.Eswari

首页> 外文期刊>Computer Science & Information Technology >A Semantic Based Approach for Information Retrieval from Html Documents Using Wrapper Induction Technique

【24h】

A Semantic Based Approach for Information Retrieval from Html Documents Using Wrapper Induction Technique

机译：基于语义的Html文档信息检索方法

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most of the internet applications are built using web technologies like HTML. Web pages are designed in such a way that it displays the data records from the underlying databases or just displays the text in an unstructured format but using some fixed template. Summarizing these data which are dispersed in different web pages is hectic and tedious and consumes most of the time and manual effort. A supervised learning technique called Wrapper Induction technique can be used across the web pages to learn data extraction rules. By applying these learnt rules to web pages, enables the information extraction an easier process. This paper focuses on developing a tool for information extraction from the unstructured data. The use of semantic web technologies much simplifies the process. This tool enables us to query the data being scattered over multiple web pages, in distinguished ways. This can be accomplished by the following steps – extracting the data from multiple web pages, storing them in the form of RDF triples, integrating multiple RDF files using ontology, generating SPARQL query based on user query and generating report in the form of tables or charts from the results of SPARQL query. The relationship between various related web pages are identified using ontology and used to query in better ways thus enhancing the searching efficacy

机译：大多数Internet应用程序是使用HTML之类的Web技术构建的。网页的设计方式是，它显示来自基础数据库的数据记录，或仅以非结构化格式显示文本，但使用一些固定模板。汇总分散在不同网页中的这些数据非常繁琐而繁琐，并且会花费大量时间和精力。可以在整个网页上使用一种称为包装器归纳技术的监督学习技术来学习数据提取规则。通过将这些学习到的规则应用于网页，可以使信息提取过程更轻松。本文着重于开发一种用于从非结构化数据中提取信息的工具。语义网络技术的使用大大简化了过程。该工具使我们能够以独特的方式查询散布在多个网页上的数据。这可以通过以下步骤完成：从多个网页提取数据，以RDF三元组的形式存储它们，使用本体集成多个RDF文件，基于用户查询生成SPARQL查询，并以表格或图表的形式生成报告从SPARQL查询的结果。使用本体识别各种相关网页之间的关系，并以更好的方式进行查询，从而提高搜索效率

著录项

来源
《Computer Science & Information Technology》 |2013年第6期|共8页
作者
A.M.Abirami; A.Askarunisa; T.M.Aishwarya; K.S.Eswari;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Semantic relation based personalized ranking approach for engineering document retrieval [J] . Gyeong June Hahm, Jae Hyun Lee, Hyo Won Suh Advanced engineering informatics . 2015,第3期

机译：基于语义关系的工程文档检索个性化排序方法
2. A Semantic and Feature Aggregated Information Retrieval Technique for Efficient Geospatial Text Document Retrieval [J] . Uma R., Muneeswaran K. Journal of multiple-valued logic and soft computing . 2017,第6期

机译：一种有效的地理空间文本文档检索的语义和特征汇总信息检索技术
3. Improved Semantic Representation and Search Techniques in a Document Retrieval System Design [J] . Nhon V. Do, TruongAn PhamNguyen, Hung K. Chau, Journal of Advances in Information Technology . 2015,第3期

机译：文档检索系统设计中改进的语义表示和搜索技术
4. Toward a retrieval of HTML documents using a semantic approach [C] . Ferri, F., Ghiselli, . 2000

机译：使用语义方法检索HTML文档
5. A semantic-based approach for software reusable component classification and retrieval. [D] . Yao, Haining. 2005

机译：基于语义的软件可重用组件分类和检索方法。
6. Towards semantic search and inference in electronic medical records: An approach using concept-based information retrieval [O] . Bevan Koopman, Peter Bruza, Laurianne Sitbon, 2012

机译：面向电子病历中的语义搜索和推理：一种基于概念的信息检索方法
7. A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING WRAPPER INDUCTION TECHNIQUE [O] . 2013

机译：基于语义的基于WRap INDUCTION技术从HTmL文档中检索信息的方法
8. Towards Development of an Improved Technique for Remote Retrieval of Water Quality Components: An Approach Based on the Gordon's Parameter Spectral Ratio [R] . Sokoletsky, L., Gallegos, S. 2011

机译：一种改进的水质成分远程检索技术的发展 - 基于戈登参数谱比的一种方法

A Semantic Based Approach for Information Retrieval from Html Documents Using Wrapper Induction Technique

摘要

著录项

相似文献

相关主题

期刊订阅