【24h】

Web object indexing using domain knowledge

机译:使用领域知识的Web对象索引

获取原文

摘要

A web object is defined to represent any meaningful object embedded in web pages (e.g. images, music) or pointed to by hyperlinks (e.g. downloadable files). In many cases, users would like to search for information of a certain 'object', rather than a web page containing the query terms. To facilitate web object searching and organizing, in this paper, we propose a novel approach to web object indexing, by discovering its inherent structure information with existed domain knowledge. In our approach, first, Layered LSI spaces are built for a better representation of the hierarchically structured domain knowledge, in order to emphasize the specific semantics and term space in each layer of the domain knowledge. Meanwhile, the web object representation is constructed by hyperlink analysis, and further pruned to remove the noises. Then an optimal matching between the web object and the domain knowledge is performed, in order to pick out the structure attributes of the web object from the knowledge. Finally, the obtained structure attributes are used to re-organize and index the web objects. Our approach also indicates a new promising way to use trust-worthy Deep Web knowledge to help organize dispersive information of Surface Web.
机译:网络对象被定义为表示嵌入在网页中的任何有意义的对象(例如,图像,音乐)或由超链接指向的任何有意义的对象(例如,可下载的文件)。在许多情况下,用户希望搜索某个“对象”的信息,而不是搜索包含查询词的网页。为了方便Web对象的搜索和组织,本文提出了一种新颖的Web对象索引方法,即通过利用已有的领域知识发现其固有的结构信息。在我们的方法中,首先,为了更好地表示分层结构化的领域知识,需要构建分层的LSI空间,以便强调领域知识每一层中的特定语义和术语空间。同时,通过超链接分析构造网络对象表示,并进一步修剪以去除噪声。然后,执行Web对象与领域知识之间的最佳匹配,以便从知识中选择Web对象的结构属性。最后,获得的结构属性用于重新组织和索引Web对象。我们的方法还表明了一种新的有前途的方法,即使用值得信赖的Deep Web知识来帮助组织Surface Web的分散信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号