...
首页> 外文期刊>International Journal of Engineering Research and Applications >Multiple Web Database Handle Using CTVS Method and Record Matching
【24h】

Multiple Web Database Handle Using CTVS Method and Record Matching

机译:使用CTVS方法和记录匹配的多个Web数据库句柄

获取原文

摘要

Web databases generate query result pages based on a user's query. For many applications, automatically extracting the data from these query result pages is very important, such as data integration, which needs to cooperate with multiple web databases. We present a novel data extraction and alignment method called CTVS that combines both tag and value similarity. CTVS automatically extracts data from query result pages by first identifying and segmenting the query result records (QRRs) in the query result pages and then aligning the segmented QRRs into a table, in which the data values from the same attribute are put into the same column. We present an unsupervised, online record matching method, UDD, which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases. We propose new techniques to handle the case when the QRRs are not contiguous, which may be due to the presence of auxiliary information, such as a comment, recommendation or advertisement, and for handling any nested structure that may exist in the QRRs
机译:Web数据库根据用户查询生成查询结果页面。对于许多应用程序而言,从这些查询结果页面自动提取数据非常重要,例如数据集成,它需要与多个Web数据库配合使用。我们提出了一种新颖的数据提取和对齐方法,称为CTVS,它结合了标签和值的相似性。 CTVS首先通过在查询结果页面中识别并分割查询结果记录(QRR),然后将分割后的QRR对齐到表中,然后将来自同一属性的数据值放在同一列中,从而自动从查询结果页面中提取数据。 。我们提出了一种无监督的在线记录匹配方法UDD,对于给定查询,该方法可以有效地从多个Web数据库的查询结果记录中识别重复项。我们提出了新技术来处理QRR不连续的情况,这可能是由于辅助信息(例如评论,推荐或广告)的存在,以及用于处理QRR中可能存在的任何嵌套结构

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号