首页> 中文期刊> 《电子学报》 >基于主题和表单属性的深层网络数据源分类方法

基于主题和表单属性的深层网络数据源分类方法

         

摘要

Nowadays,Deep web consists of vast amounts of high quality information which is rising rapidly. However,because of its distributed character,heterogeneity, autonomy etc, it is faced with huge challenges for users to obtain the information efficiently and quickly which they are interested in. Deep Web data sources are organized by the domains in the real world, which is the foundation for addressing this challenge. In this paper,based on the statistics and analysis on more than 200 data sources which are from four different fields(i. e., Airfares, Books, Automobiles and Real estates, a novel classification method and an improved similarity measure of query interfaces were proposed to realize the automatic classification of large masses of deep web sources, which make full use of theme information and form attributes. In addition, we present a strategy of tagging query interface to reduce the influence resulted from choosing initial centers randomly. The experimental results indicated that the method is effective and has higher accuracy.%当前深层网络中蕴含着高质量的海量信息并且其数量不断地增长,由于深层网络具有分布、异构、自治等特点,用户高效、快捷地获取自己感兴趣的信息面临巨大挑战.将深层网络数据源按领域分类是解决这一挑战的基础.本文以对航空订票、图书、汽车和房地产领域的200多个数据源的统计和分析为基础,充分利用主题和表单属性信息,提出了一种新的深层网络数据源分类方法以及改进的查询接口相似性度量方法,实现深层网络数据源的自动分类.本文还提出了一种查询接口标记策略,以降低随机选择初始中心点所产生的影响.实验结果表明该方法具有较高的分类精度.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号