首页> 外文会议>International Conference on Internet Computing >Content Based Search in Web Archives
【24h】

Content Based Search in Web Archives

机译:基于内容的Web档案中的搜索

获取原文

摘要

The widespread use of Internet as potentially useful and important data repository has led to the proliferation of Internet usage in search of valuable information to be used for important decision making. As the amount of the data stored and made available in the Internet grows, it becomes extremely necessary to create and maintain an Internet archive as a data repository for the purpose of supporting backup and record-keeping. Searching in the archival database is very complex process, however, in that there is simply too much information to be searched for. Furthermore, the ever-changing and dynamic natures of the web pages add more problems to the search process for desired information. In this paper, we propose an efficient approach to the problem of searching for the most relevant data source from the archives. There are two main issues that affect the search process in an archival database. One is the problem of understanding and interpreting the user's search intentions often represented as a form of a sequence of key words or natural language sentences. The other issue is the problem of mapping between the identified user intentions and the most relevant web pages satisfying the search objectives. In this paper, we present a sound solution to the latter problem of mapping pages with search intentions by using web page's content. To achieve the goal a web page indexing by vector features is developed. Active contour models are employed to extract geometric features. The main advantage and implication of this method is the indexing and geometric features extraction procedure, which will lead to the improvement of the accuracy of the search results as well as quicker retrieval of the most desired web pages.
机译:互联网的广泛使用是潜在的有用和重要的数据存储库,导致了互联网使用的扩散,以寻求用于重要决策的有价值的信息。由于存储和在Internet中提供的数据的量来增长,因此为支持备份和记录保留而创建和维护Internet存档作为数据存储库。在归档数据库中搜索是非常复杂的过程,因为简单地搜索了太多信息。此外,网页的不断变化和动态的自然对搜索过程增加了更多问题以获得所需信息。在本文中,我们提出了一种有效的方法来搜索来自档案最相关的数据源的问题。有两个主要问题会影响档案数据库中的搜索过程。一个是理解和解释用户搜索意图的问题,通常表示为一系列关键词或自然语言句子的形式。另一个问题是识别的用户意图和满足搜索目标的最相关的网页之间映射的问题。在本文中,我们通过使用网页的内容向正在使用搜索意图映射页面的后一种问题的声音解决方案。为了实现目标,开发了通过矢量功能的网页索引。采用主动轮廓模型来提取几何特征。该方法的主要优点和含义是索引和几何特征提取过程,这将导致搜索结果的准确性的提高以及更快地检索最期望的网页。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号